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' Abstract. We present here novel insight into exchange-correlation functionals in density functional theory, 

based on the viewpoint of optimal transport. We show that in the case of two electrons and in the semiclas- 
sical limit, the exact exchange-correlation functional reduces to a very interesting functional of novel form, 
which depends on an optimal transport map T associated with a given density p. Since the above limit is 
strongly correlated, the limit functional yields insight into electron correlations. We prove the existence and 
, uniqueness of such an optimal map for any number of electrons and each p, and determine the map explicitly 

(-H ' in the case when p is radially symmetric. 
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1 Introduction 

The precise modelhng of electron correlations continues to constitute the major obstacle in developing high- 
accuracy, low-cost methods for electronic structure computations in molecules and solids. In this article 
we shed new light on the longstanding problem of how to accurately incorporate electron correlation into 
density functional theory (DFT) , by deriving and analyzing the semiclassical limit of the exact Hohenberg- 
Kohn functional with the single-particle density p held fixed. In this limit, we find that the exact functional 
reduces formally to a very interesting functional of novel form which depends on an optimal transport map 
T associated with a given density p. Our work thereby links DFT, which is a large and very active research 
area in physics and chemistry [PY95, FNM03, Ra09], for the first time to optimal transportation theory, 
which has recently become a very active area in mathematics |GM96[ IRu96[ IVill09j . 

In optimal transportation theory the goal is to transport "mass" from an initial density pA to a target density 
PB in such a way that the "cost" c{x,y) for transporting mass from a; to y is minimized. Mathematically, 
this means that one minimizes a cost functional J c(x,y)d'j{x,y) over a set of joint measures 7 (in physics 
terminology: pair densities) subject to fixed marginals (single-particle densities). See below for a precise 
formulation. The main mathematical novelty of the optimal transportation problem arising from DFT, 

Minimize / 7^7(2;, y) subject to equal marginals p, (1-1) 

Jr<> \x - y\ 

is that the cost, which is given by the Coulomb law c(x, y) = l/\x — y\, decreases rather than increases with 
distance and has a singularity on the diagonal. 

Our goals in this paper are 

(i) to prove that for any given single-particle density p, the optimal transportation problem with Coulomb 
cost possesses a unique minimizer which is given by an optimal transport map Tp associated with p. (It is 
well known that uniqueness is false for the seemingly simpler cost function c{x,y) = |a; — y\.) 

(ii) to derive an explicit formula for the optimal map in the case when p is radially symmetric. (Note that 
in physics, radial densities arise as atomic ground state densities for many elements such as He, Li, N, Ne, 
Na, Mg, Cu.) 

(iii) to prove that DFT with electron interaction energy given by the optimal transportation cost Eot[p\ 
(defined as the minimum cost in (|1.1|) ) is the semiclassical limit of exact Hohenbcrg-Kohn DFT in case of 
two electrons, and establish basic properties such as that it is a rigorous lower bound to exact DFT for any 
number of electrons. 

We do not know whether our semiclassical limit result remains true for a general number of electrons. 
As explained in Section 5, this question is related to the representability problem for two-particle density 
matrices [CY02]. 

To prove (i) and (ii) we adapt geometric methods as developed in [GM96], |Ru96| . and |SK92| . for cost 
functions that increase with distance. In our case of decreasing cost functions, one can still geometrically 
construct a potential which specifies both in which direction and how far to move the mass of pA which sits 
near x. To prove (iii) we will need to make modifications to the optimal transport plan which yields Eot[p], 
since any wave function whose pair density is given by the optimal plan has infinite kinetic energy. The main 
technical idea here is a construction to re-instate the original marginals after smoothing. 

The optimal transportation functional Eqt [p] which emerges as a limit of the Hohenberg-Kohn functional 



should be viewed as a natural "opposite" of the well known mean field functional J[p\ = \ J^, "^Jf^f clxdy. 
it arises in a strongly correlated rather than a de-correlated limit, thereby yielding valuable qualitative 
insight into electron correlations. Wc also believe that Eot[p\ has a role to play in the design of quantitative 
competitors to existing exchange-correlation functionals: it provides an alternative starting point of novel 
functional form for designing approximations, and - just like the mean field functional - could be incorporated 
as an ingredient into hybrid functionals. Basic quantitative issues are addressed in the companion paper 
[CFKll]. 

This paper is structured as follows. In Section 2 we discuss density functional theory from a mathematical 
perspective. In section 3 we introduce optimal transport theory, prove in Theorem 13.61 the main result of 
the section, the uniqueness of the optimal transport map, and establish some of its general properties. In 
section 4 we give in Theorem 14.101 an explicit formula for the optimal map for equal, radially symmetric 
marginals. In section 5 we compare the optimal transportation cost Eot[p\ to the exact Hohenberg-Kohn 
functional, and show that it is its semiclassical limit in the case of two electrons (Theorem l5.2|) . as well as a 
lower boimd for any number of electrons ( Theorem 1 5. 



2 Density functional theory 

Density functional theory (DFT) was introduced by Hohenberg, Kohn and Sham in the 1960s in two fun- 
damental papers [IIK64, KS65], as an approximate computational method for solving the many-electron 
Schrodingcr equation whose computational cost remains feasible even for large systems. In this theory, one 
only computes the single-particle density instead of the full wave function. In order to obtain a closed equa- 
tion, a closure assumption is needed which expresses the pair density in terms of the single-particle density. 
A simple "independence" ansatz has turned out to be far too crude in practice. A huge effort has gone into 
developing corrections which account for the failure of independence [PY95; FNM03, Ra09]. 

Our goal in this section is to discuss DFT from a mathematical perspective. 

The key quantity DFT aims to predict is the ground state energy Eq of a molecule as a function of the 
atomic positions. From this, further properties can be readily extracted, for instance, in order to determine 
a molecule's stable equilibrium shapes one minimizes Eq (locally or globally) over atomic positions. 

Starting point for developing DFT models is the "exact" (non-relativistic, Born-Oppcnheimer) quantum 
mechanical ground state energy Eq^^ . The definition contains some details which may look a bit complicated 
to readers not familiar with many-particle quantum theory, but the basic mathematical structure relevant 
to developing DFT models is simple. E^^'^ is the minimum value of a suitable energy functional £ over a 
suitable class of functions A. 



2.1 Exact ground state energy 

The detailed definition is as follows. Consider a molecule with atomic nuclei at positions Ri, ..jRm £ M'^, 
with charges Zi,..,Zm e N (Z = 1 means hydrogen, Z = 2 Helium, Z = 3 Lithium, Z = A Beryllium, 
Z = 5 Boron, Z — 6 Carbon, Z ~ 7 Nitrogen, Z = 8 Oxygen, and so on), and with N electrons. The energy 
functional depends on the positions and charges of the nuclei only through the ensuing Coulomb potential 

M 

^(-) = -E^> (2-1) 

a— 1 ' ' 

For the discussion below, the potential v can more generally be any function in the space L^^'^(R^)+L°°{M.'^) = 
{vi + V2\vi e L^^'^{R^), V2 e L°°(M^)}. The class of functions A is given by 

= e L^{{M.^ X Z2)^;C) I V* e L^, 5- antisymmetric, ||^'||i2 = 1}. (2.2) 



(Here antisymmetric means \I'(zo.(i), .., Zo-(Ar)) = sgn(<7)'i!{zi, .., zn) for all permutations cr, where zi, zn € 
X Z2 are space-spin-coordinates for the N electrons. Spin will not play a big role in the sequel, but the 



fact that the functions ^ depend on all the positions of all the electrons is important. It leads to the fact that 
we will have to deal with "N-point distributions".) Elements of A are called (N-electron) wave functions. 

The energy functional is given by 

£QAf j^] ^ + y„J^f] + Veei'i'] (2.3) 

where (employing the notation Zi = {xi,Si) G K""* x Z2, J dzi = X^s GZ2 /r^ ^'*^*) 

TV 

= ^ / ••• / X! I^^i*(^l' 3;Af, SN)\'^dzi..dzN 

is the kinetic energy, 

Ke[*] = j ... j '^v{xi)\^{xi,Si,..,XN,SN)\'^dzi..dZN 
i—l 

is the electron-nuclei interaction energy, and 

V;e[*] = / ... / 1 A'ifixi^si, ..,XN,SN)\'^dzi..dzN 

/ / ^ — ' \Xi — Xi 

is the electron-electron interaction energy. 

The quantum mechanical ground state energy is defined as 

E^^' = inf S^^'W. (2.4) 

In the usual case where v is given by ()2.ip and N ~ X^qLi (neutral molecules), it is a basic theorem due 
to Zhislin that the infimum is attained. For a simple proof see [Fr03] . 

Since the energy functional is a quadratic form, we could equivalently have defined as the lowest 

eigenvalue of the corresponding linear partial differential operator — i X^i^i ''^(^0 +Si<j \x-x-\ - 

This formulation is useful for many other purposes, but - unlike (|2.4p - does not play an important role in 
DFT. 

2.2 Probabilistic interpretation; marginals 

The absolute value squared of 4* can be interpreted as an A^-point probability distribution, 

|^(a;i, si, .., xm, sn)\^ = probability density that the electrons are at positions Xi with spins Si. 

In quantum mechanics this is known as the Born interpretation. Note that the above function is nonnegative 
and integrates to 1, due to the requirement that ^ has norm 1. (In fact this was the physical motivation 
for this requirement.) 

Various partial marginals will play an important role. First, by integrating out the spins we obtain the 
A^-point position density: 

p*(xi, ..,XAr) = ^ \^{xi,si,..,xn,sn)? ■ (2.5) 

Sl,..,SivG:2^2 

Next, by integrating out all but two respectively one electron positions we obtain the pair density and the 
single particle density: 

pt{xi,X2) ^ \\ \ I p%{xi,..,XN)dX3..dXN, (2.6) 

V ^ / JR3("-2) 

p*(a;i) = iV/ p'%{xi,..,XN)dx2..dxN- (2.7) 



For the rest of this section, we drop the superscript 'I' from p* and p*. 

The normaUzation factors are a convention in quantum mechanics so that p integrates to the number of 
particles and p2 to the number of pairs in the system. (In Section 3 we find it convenient to work with the 
corresponding probabihty densities, normaUzed so as to integrate to 1.) With the above conventions, the 
important fact that p is a marginal distribution of p2 takes the form 



N [ N f 

-p- J p2{x,y)dy = p{x), -p- J p2{x,y) dx = p{y). (2.. 



The relevance of p and p2 for determining the ground state energy (|2.4p come from the fact that the electron- 
nuclei energy Vne and the electron-electron energy Vee in (|2.3p depend only on these. 



Lemma 2.1. With the above definitions, for any ^ A we have 

Ke[*] = / v{x)p{x)dx, Vee[*]=/ , ^ p2{x,y)dxdy. (2.9) 
Jr3 ./ks \x~y\ 

Proof. This follows from definitions (|2.6p . (|2.7p and the fact that due to the antisymmetry of ^, \'^{zi, .., zn)\'^ 
is a symmetric function of the z^. □ 

In the sequel we write Vne[p], Vee[p2] instead of V^e[^'], Vee[4']. 

Also, we note that the space of densities arising from functions 'i/ E A, 

n:= {p : ^ M I p is the density dSJ]) of some * G A}, (2.10) 

is known explicitly: by a result of Lieb [Li83] , 

n = {p : ^R\p>0, yTpe H\R^), [ p{x)dx^N}, (2.11) 

where i/i(K^) is the usual Sobolev space {u G L'^(R^) | Vu G L^'' 



2.3 Universal Hohenberg-Kohn functional 

The expression for V„e derived in Lemma |2 . 1 1 leads to the following well known partitioning of the minimiza- 
tion (|2.4|) into a double minimization (first minimize over 4' subject to fixed p, then over p): 

E^^' ^ inf {Fhk[p]+ f v{x)p{x)dx] (2.12) 
pen I. Jr3 J 

with 

Fff^f[p]:= inf {t[^Sj] + VM}- (2-13) 

Here and below, the notation 5" !-> p means that 'J has single-particle density p. Note that Fhk is a 
universal functional of p, in the sense that it does not depend on the external potential v, and is called 
the Hohenberg-Kohn functional. It is defined on the admissible set p.lO|) . The above constrained-search 
definition of Fhk is due to Levy and Lieb [Lc79; Li83]; in the original Hohenberg-Kohn paper [HK64] the 
functional was constructed in a more indirect and slightly less general way, requiring that p be the density 
of some 5* which is a non-degenerate ground state of f'^A/ f^^. gome potential v. 



2.4 Exchange-correlation functionals 



The problem with definition (|2.4p of Eq^^ , as well as definition (|2.13p of the 'exact' density functional Fhk, 
is that it is unfeasible in practice except when the number N of particles is very small. This is due to the 
so-called problem of exponential scaling: the functions over which one minimizes are functions on and 
the discretization of requires a i^T^-point grid if the single-particle space M'^ is discretized by a A'-point 
grid. 

This problem would disappear if we could accurately approximate Vee in the variational principle (|2.4p by a 
functional Vee of p instead of p2 , 

Vee[p2]~Vee[p]. (2.14) 

(Why this is so is not completely trivial, since there remains T to deal with, but see eq. (|2.17p below.) Thus 
in DFT one approximates the variational principle for the ground state energy E^^'^ by: 



= inf 



[rm + VM+Veelp]]- (2.15) 



Physically, this means (in the light of Lemma 12. ip that in DFT, interactions of electrons with an external 
environment, such as the Coulomb forces exerted by an array of atomic nuclei, are included exactly, but 
electron-electron interactions have to be suitably "modelled" . By partitioning the minimization in (|2.15p 
analogously to (^1^ . (j^l^ . E^^^ can be obtained by minimization of a functional of p alone. 



= inf 



{TQM[p]+Vne[p\+Vee[p]]. Tqm[p] ■= mf T[*]. (2.16) 
pGTl y J *Gyt, *i->p 



The minimization over the "large" space A of functions on (R'^ x 1,2)^ in (j2.16p can now be replaced by 
a minimization over a much "smaller" space. As can be shown with the help of reduced density matrices 
[CY02], and abbreviating / = E^eZa /r3' 



CO , „ 



< A, < 1, ^ A, = iV, 



eH\R^xZ2), J 4>^'^^S,,,Y,Y1 \Mx,s)\^ = p{x)}- 

(2.17) 



i=l SGZ2 



After truncating the sum after an appropriate number imax of terms (the standard truncation being imax ~ 
N, yielding the Kohn-Sham kinetic energy functional [KS65]) and discretizing by a K-point grid, the 
number of degrees of freedom of the right hand side scales linearly instead of exponentially in N. 

By means of this fact, the task of eliminating the exponential complexity of p.4p is reduced to the following 



Fundamental problem of DFT Design accurate approximations of the form \2.1^^ . In other words, 
approximate a simple explicit functional of the pair density p2, a function on R^, hy a functional of its 
joint right and left marginal p, a function on . Note that the approximations only need to he accurate for 
single-particle densities and pair densities of ground states of molecules, not arbitrary states. Elsewhere it 
suffices that the approximations give a reasonably good lower bound so as to avoid spurious minimizers. 

Example 2.2. (statistical independence) The simplest idea would be to assume statistical independence, 

P2ix,y)^^p{x)piy) (2.18) 

(the factor 1/2 coming from the normalization factors in (|2.7p . (|2.6p ). and substitute this ansatz into the 
formula for Vee derived in Lemma |2. II This leads to taking 

Vee[p] = l f ^^pix)piy)dxdy=:J[pi (2.19) 

^ f ~ y\ 



i.e. Vee is replaced by the Coulomb self-repulsion of the single-particle density. The above mean field 
functional appears, for instance, in Thomas-Fermi-theory. 

In modern DFT, the very naive ansatz (|2.18p was never used, but - without this being natural from a 
probabilistic point of view - the convention is to include corrections to it additively, i.e. one makes an ansatz 

VeM = J[p]+EM (2.20) 

the additive correction being called an exchange- correlation functional. This notational convention should 
not, of course, prevent us from contemplating non-additive modifications of (|2.18l) and (|2.19p . □ 

Example 2.3. (correctly normalized mean field) Let p2 ~ (^)7 and p — Np, so that 7 and /i have integral 
1. Then 

P2 p® p 
1 ~ p® -p^ 

V2 j 



iV2 



which is equivalent to 



P2«^ (1-^ 



where here and below we use the notation (p(g) p'){x,y) = p{x)p'(jj), corresponding to the product measure 
when interpreting p, p' as measures. 

Note that physicists and chemists use 

1 

P2 ~ -^P® P-, 

which is justified in the context of macroscopic systems such as an electron gas (where one has taken a limit 
N 00), but less natural in the context of DFT for atoms and molecules. □ 

Example 2.4. (local density approximation) In a model system, the so-called free electron gas, the pair 
density can be determined explicitly [Fr97]. In this case the single-particle density is a constant. 



pix)=p, (2.21) 



and the pair density can be determined to be 



P2{x,y) = l-f(l--h(i3pnY/'\x-y\Y), (2.22) 



where 

his) 



q 

3(sin s — s cos s) 



In particular, at long range \x — y \ ^ 00 statistical independence is correct, but at short range — j/| — > 0, 
P2 tends to zero in the case of a single spin state, i.e. it vanishes on the diagonal x = y, and to half the size 
of a statistically independent sample in the (physical) case of two spin states. Substituting the result (|2.22|) 
into the formula for Vee [P2] leads to the so-called local density approximation 

Vee[p] = J[p] - I Pivf'^dy, (2.23) 



where Cx = |(f )^^'^- As a heuristic approximation to Vee, this formula goes back to Dirac and Bloch (for a 
rigorous justification see [Fr97]). It was widely used in the early days of DFT, following [KS65]. □ 

Example 2.5. (B3LYP) Current functionals used in practice, e.g. the 'B3LYP' functional of Becke, Lee, 
Yang and Parr [Be93, LYPSS], rely on ~ from a mathematical point of view questionable - guesses of 
functional forms (e.g. local in p, or local in p and Vp), additional terms depending non-locally on the 
orbitals in (j2.17p . and careful fitting of parameters to experimental or high- accuracy-computational data. 
The resulting expressions are a little too complicated to write down here. They have led to an accuracy 
improvement for Eq^^ over the local density approximation of an order of magnitude or so. but not more, 
with little progress in the last decade despite continuing effort. □ 



3 Optimal transportation for DFT 



We begin with a basic observation. The weight factor in front of p2 in T4e in (|2.9p is always positive, and 
largest on the diagonal x ~ y, so even "complete anticorrelation" might be a better ansatz than independence 
(keeping in mind that the states on which the ansatz needs to be good are the minimizers of a functional 
which includes T4e). 

In fact, such a complete anticorrelation is exactly what emerges when one starts from the exact Hohenberg- 
Kohn functional Fhk, inserts a semiclassical factor in front of the kinetic energy functional T in (j2.13p 
and passes to the semiclassical limit /i — > 0. In this limit, the Hohenberg-Kohn functional Fhk reduces to 
the following functional obtained by a minimization over pair densities instead of wave functions, 

F[p]= inf [ I ^ p2{x,y)dxdy. (3.1) 

Here P2 ^ P means that p2 satisfies eq. (|2.8p . i.e. it has right and left marginal p, and the set 7?,2 of 
admissbile pair densities is the image of A under the map '■^ ^ p2. Unlike the corresponding admissible set 
of single-particle densities, 7?.2 is not known explicitly (this is a variant of the representability problem for 
two-particle density matrices [CY02]). 

Formally^ ignoring this point and discarding in particular the smoothness restriction (which can be proved 
analogously to the proof in [Li83] of (|2.1ip ) that 

P2^n2=^ ^^H^^""), (3.2) 
the above functional F reduces to the functional 

Eot[p\= inf C[p2l C[p2]- f -^dp2{x,y), (3.3) 

where Ai^ denotes the set of (nonnegative) Radon measures on M^. For a rigorous justification that p.3p is 
indeed the correct semiclassical limit of F^k in case N = 2 see section 5. 

The variational problem that has appeared here, to minimize a "cost functional" C over a set of joint 
measures on MP subject to fixed marginals, is an optimal transport problem. This type of problem, with 
"cost functions" such as \x — y\ or \x — instead of \x — y\^^, dates back to Monge in 1781 and has a 
famous history, which is nicely summarized in the very readable paper [GM96] , which was our main source 
when studying the problem p.3p . 

Before formulating our particular problem we give some notations and definitions. 

For a set Z C M'', we denote by V{Z) the set of probability measures on Z. If Z is a closed subset of 
and 7 E ViZ), then the support of 7 is the smallest closed set supp 7 C Z of full mass, that is such that 

^{supp 7) = l{Z) = 1. 

Suppose X, F C M'* are closed sets. If /x g ^^{X) and v € 'PiY), we denote by r(/x, the joint probability 
measures 7 on R'' x R'' which have /i and z/ as their marginals, that is with ij,{U) = j{U x R'') and 
iy{U) = 7(R'' X U) for Borel U C M"*. In fact, if 7 G r(/i, i^), then supp ■y C X xY. Typically p has density 
ui and 1/ has density U2, where wi, U2 arc fmictions. in which case we write T{ui,U2). 

Remark 3.1. In this section it is convenient to eliminate the pref actors in p2 and p, and to consider the 
cost functional C on probability measures 7, with equal marginals p (which are again probability measures), 
with p2 in h3.S\) corresponding to (^)7 and p corresponding to N ji as in Examvle \2.S[ 

We first formulate the general problem, which is now called the Kantorovich problem. For some cost function 
c : R'^ X R'^ — >■ R U {+00} we are interested in minimizing the transport cost 

C[7] := / c{x,y)dj{x,y), (3.4) 



among joint measures 7 S r(^, v), called transport plans, to obtain 

inf C[7]. (3.5) 

Let Tin, v) be the set of Borel maps T -.Q. ClW^ that push ^ forward to v, i.e. T^pl[V] := ^JL{T~'^{V)) = 

v{y) for Borel y C M''. The so-called Monge problem is to minimize 

/[T] = / c(x,T(a:))rf^(a;) (3.6) 



over maps T in T(/i, i^), called transport maps, to obtain 

inf I[T]. (3.7) 

There is a natural embedding which associates to each transport map T G T(/i, J^) a transport map "fx 
(id X T)^fi € r(/x, i^) or, in physics notation, 7t(x,?/) := 5T(x){y)l^{x), where id: M — > M is the identity map. 
Since C[7t] = I\T], we conclude that 

inf Ch] < inf I\T]. (3.8) 

Let K := K U {±00} and endow R with the usual topology so that c G C{W^ x means that 

limjj. c(x, y) = c{x,y). In particular, if c{x,y) = +00, then c{x,y) tends to +00 as {x,y) tends 

to {x,y), and similarly for limit —00. 

Throughout we will work with cost functions c such that c{x,y) = h{x — y) for all x,y G M.'^. We recall 
the definition of the dual of a convex function and refer to Rockafellar |Ro72j for standard definitions and 
further background. 

Definition 3.2. The dual (or Legendre transform) h* : — > M U {+cx)} of a convex function h : M.'^ — > 
M U {+00} is given by 

h*{y) sup{< x,y > -h{x)}. (3.9) 



Since our cost functions of interest are not convex, we need to work with the following more subtle definition 
of Legendre transform. 

Definition 3.3. (generalized Legendre transform) Suppose that Z : R — ^ RU {+00} is lower semi- continuous 
and convex. Define fc : R — ^ R U {+cxj} by k{\) = if X > and k(X) = +00 otherwise. Define 

h* : R'^ ^ RU {+CX)} by 

/i*(a;) = fc*(-|a;|) sup{-/?|x| - /c(|a;|)}, xeR'^. (3.10) 
We define = fc*(-|A|) for A e R. 

Definition 3.4. A function ip : — > R U {—00}, not identically —00, is said to be c-concave if it is the 
infimum of a family of translates and shifts of h{x): i.e, there is a set A C R*^ x R such that 

V'(x) := inf {c{x,y) + X}, xeR'^. (3.11) 

(y,A)GA 



Let A := {{x, x) \ x € R'^} and c : R'' x R^ ^ R U {+00} be such that c{x, y) := h{x - y) l{\x -y\)>Q 
and with c and I such that 



(Al) I : [0,00] — > [0,03] is strictly convex, strictly decreasing and on (0,03); 
(A2) ce Ci(R'' X R^\ A,R); 



(A3) c : R'^ X R'^ ^ [0, +00] is lower semi-continuous; 
(A4) for every xo G M'', c(a;o, xo) = +00. 

Remark 3.5. Note that ]GM96^ only assume that I is continuous and not C ; using their arguments, we 
could replace (Al) and (A 2) by 

(Al') I : [0,00] — >■ [0,00] is strictly convex, decreasing and continuous on (0, 00); 
(A2') c e C{W^ X R''\ A,R). 

The main result of this section, obtained by combining the results from Theorems 13.251 and I3.27[ is 

Theorem 3.6. Assume that c and I satisfy (Al)-(A4), and let ^^v d ViW'') he absolutely continuous with 
respect to the Lebesgue measure. Then there exists a unique minimizer 7 e r(/x, v) of the functional C 
defined in \3.4^ , and a unique transport map T pushing fi forward to v such that 7 = {id,T)^^ (or, in 
physics notation, j{x,y) = 5j'i^^^{y)iJL{y)). This map is of the form T{x) ~ x — V/i*(V'0(a;)) for some 
c-concave function ip on R"*. 

We next give a simple example, which illustrates the emergence of such an optimal map T when the marginals 
consist of two Dirac delta functions, and explicitly compute the function h* appearing above for our cost 
function of interest. 

Example 3.7. Let a, g R. For c and / as in (Al)-(A4) we are interested in minimizing 

J c{x,y)dj{x,y), (3.12) 

subject to 

J ^{x,y)dx = Saiy) + Sb{y ) and j ^{x,y)dy = 5a{x) + 5b{x), (3.13) 

where 5a is the Dirac function such that, for all Borel subsets of R, we have J^Saiy) = 1 if a G O, and 
/q 5a{y) = otherwise. We claim that the minimum in (j3.13p is attained for 

7(2;, y) = 5a{x)5b{y) + Sb{x)Sa{y) -■ joix, y). (3.14) 

To show this, note first that 

7(2;, y) = Caa5a{x)5a{y) + CabSa{x)Sb{y) + Cba5b{x)5a{y) + Cbb5b{x)5b{y) , 

with Caa,Cab,Cba,Cbb > and Caa + Cab + Cba + Cbb = 2. Duc to the Constraints on 7 from (|3.13p we have 
Cba = Cab and Caa = Cbb- Hcncc 

7(x, y) =a {Saix)Saiy) + db{x)5b{y)) + [i {Sa{x)Sb{y) + 5b{x)Sa{y)) , with a, ^ > and a + ^ = 1. 

Minimizing p.l2p subject to p.l3p is then equivalent to the following problem: 

Minimize 2a l{0) + 2(3 l{\b — a\) subject to the constraints a, j3 > 0, a + /3 = 1. 

Since by (Al) we have 1(0) > l{\a — b\), the minimum in the above is attained for a ~ and /3 — 1, which 
proves the claim. 

Formula p.l4p for the minimizer admits a very interesting interpretation which motivates the notion of 
optimal transport map and foresees the structure of general minimizers as given in Theorem l4.8l Denote the 
single-particle density Sa{x) +db{x) in p.l3p by p{x), and introduce the map T : {a, b} — > {a, b} which maps 
a to 5 and b to a. Then T pushes p forward to p, and the optimal measure 70 has the form 

Jo{x,y) = 5T(x){y)p{x), 

or, in measure-theoretic notation, 

70 = {id,T)#p. 

□ 



Example 3.8. (generalized Legendre transform for Coulomb cost) Let h{x) = k{\x — y\) with fc(A) = for 
A > and -\-oo for A < 0. Note that k : M — >■ MU {+00} is lower semi-continuous and convex. The ordinary 
Legendre transform of k is given by k*{s) ~ sup^gjj{/3s — fc(/3)}. Since = +00 for /3 < 0, negative values 
of /3 do not contribute to the above suprcmum (the term in brackets then being —00). Consequently, 

/c*(s) =sup{/3s-fc(/3)}. 

When s > 0, we infer k*{s) = +00. It thus remains to calculate 

k*{-s) ^ svlp{-I3s - k{l3)}, s > 0. 

fi>0 

Recall now that k((3) = 1/13. The elementary calculus problem of maximising the function in brackets on 
(0,00) has the unique solution (3 = l/\/s, whence k*{—s) = —2y/s for s > 0. Altogether it follows that the 
generalized Legendre transform of h is 

h*{x) = xeW^. (3.15) 

Consequently, if c(a;, y) = \x-y \ then the optimal map T in Theorem 13. 6 1 is of the form T{x) = x + yy^^jp72- 
□ 

Subsections 3.1-3.4 are devoted to the proof of Theorem l3.6l The proofs follow partially the proofs in |GM96j . 
jKM07| ■ jGM95j and jGOOTj . In the final subsection, 3.5, we derive some general properties of the unique 
optimal measure and the optimal cost under the assumption of equal marginals, jjL = v. 

3.1 Definitions and notation 

In the following wc present some definitions which are needed throughout the remainder of the section. 

Definition 3.9. (1) Let V dW^. A c-concave function ip -.R"^ ^RU {-00} is said to be the c-transform 
on V of a function (p if iS.ll]) holds with A G V x R. Moreover, 

ip{x) = mi {c{x,y) - (t){y)}, 

y&V 

for some function : — > R U {—00}. 

(2) The c-transform of a function ?/> : R'^ — > M U {—00} is the function ip'^ : R'' — > R U {— cx)} defined by 

ip''{y)= \ni {c{x,y)-ip{x)}. 

(3) A subset S <Z X xY is called c-cyclically monotone, if for any finite number of points {xj,yj) Cz S,j = 
1 , . . . , n and permutations a : {l,..,n}— 

n n 

'^c{xj,yj) < ^c(.T^(j),yj). (3.16) 

j=i i=i 

Remark 3.10. Remark 3.4 of \GO07l proves the following: //?/': M"^' ^ M U {— cxd} is not identically —00, 
and is given by p. lip , then we have 

(i) ipiy) > —A > —00 for all {y,X) G A, where A is the set in (j3.1ip . Hence, ip'^ is not identically —00. 

(ii) ip'^'^ = -0. 

Theorem 3.11 (optimal measures have c-cyclically monotone support: Proposition 3.2 of |GO07j ). Assume 
that X,Y ^ are closed sets, that jj, € V{X), v G 'P(i^) and that c> is lower semicontinuous on X xY . 
Then the following hold: 



(a) There is at least one optimal measure 7 G r(^, i^). 

(b) Suppose that in addition c e C{X x y, K). Unless C = +00 throughout T{fi^iy), there is a c- cyclically 
monotone set S C x containing the support of all optimal measures in r(/i, v). 

Definition 3.12. (1) A function : R'^ M U {—00} is superdiffcrentiable at x e , if ip{x) is finite 
and there exists y €E R'^ such that 

ip{x + z) < %p{x)+ < z,y> +o{\z\) as \z\ 0; (3.17) 

here o(A) means terms ri{X) such that ri{\)/\ tends to zero with A. 

(2) A pair {x,y) belongs to the superdifferential d'ip c R'' x of tp, ifip{x) is finite and JgJTl j holds, in 
which case y is called a supergradient of i/j at x. Such supergradients y comprise the set d'tpix) C M.'^ , 
while for V CM.'^ we define d'tp{V) :— Ux^vd'tpix). 

(3) The analogous notions 0/ subdifferentiability, subgradients and the subdifferential d.ijj are defined by 
reversing inequality j7p . 

(4) A real-valued function tp will be difjerentiabale at x precisely if it is both super- and subdifjerentiable 
there; then 

d-ip{x) = d.ipix) = {VV'(x)}. 

Definition 3.13. The c-superdifferential d'^tp of ^ : M'' ^ R U {—00}, not identical —00, consists of the 
pairs {x, y) € R'^ x R'^ for which c{x, y) — i^{x) < c{z, y) — 7/1(2) for all z € R'^. 

Lemma 3.14 (relating c-superdifferentials to subdiffercntials: Lemma 3.1 of [GM96j ). Let /i : R"^ R"^ and 
ip : R"^ — > R U {—00}. // c{x, y) = h{x — y), then [x, y) € d'^ip implies d.il>{x) C d.h{x — y). When h and tp 
are differentiable, then \J'ip{x) = Vh[x — y). 

Definition 3.15. A function -0 : ^ M U {-co} is said to be locally semi-concave (locally semi-convex) 
at p £ R'', if there is a constant A < 00, which makes ipix) — A|xp concave (convex) on some (small) open 
ball centered at p. 

Remark 3.16. Suppose that 11 G T^iX) and v G V{Y^ have no atoms and that 7* minimizes C[7] over 
r(/i, u) and that C(7*) < 00. Then 7*(A) ~ and so, supp 7*\ A contains at least one element, say (xq, yo). 
Also "/*{E) = 0, where E — (xq x y) U {X x yo)- Hence the set X xY\{Er\ IS) is non-empty, so it contains 
an element (ioj^o)- Note that xo,yo ^ {xo,yo}- 

We will need the non-atomic property of the marginal measures fi and v (and the resulting remark above) 
from the uniqueness section [3.31 onwards, as a means to bypass the singularity of c on the diagonal. We will 
use Remark 13.161 in Lemma 13.201 and in Lemma 13.211 below. 

Remark 3.17. Suppose that S G X x Y is c-cyclically monotone and contains two pairs (xq, yo)j (s^Oj yo) 
such that xo 7^ ya, Xq 7^ yo o,nd Xo,yo ^ {xq, yo}. Then for all (x, y) G S , we have x ^ y. 

Proof. Let us assume that {x, y) G S, with x = y. There are the following possibilities to consider: x = y = 

Xq, X ^ y = yo, X = y = xq X ^ y ^ yo and x,y {xq, 2/0, ^o, yo}- 

We present a proof for one of the cases, the other cases being treated analogously. Consider x = y = yo. 
Then x,y ^ {xq, xo, yo} and from p.l6p . we get 

c{x, y) < c{x, yo) -\- c{xo, y) + c{xo,yo) ~ c{xo,yo) - c{xo, yo) < +00, 
which leads to a contradiction as c{x, y) = -f 00. □ 



3.2 Existence of an optimal measure with c-cyclical monotone support 



The main issue in this seetion is not the existence of an optimal measure, as the existence of an optimal 
measure is assured by Theorem 13. Ill (a), but the existence of an optimal measure with c-cyclical monotone 
support. In order to use Theorem l3.11l fb) and construct such an optimal measure, we need to first construct 
a joint measure 7, with marginals /i and v and with C[7] < 00. This is done in the Lemma below. 

Lemma 3.18. Suppose that /i, v G V{W^) and are absolutely continuous with respect to the Lebesgue measure. 
Then there exists 7 G r(/i, v) and e > such that for all (x, y) G supp 7, 

\x-y\> e 

Proof. The proof follows similar arguments as the proof of Proposition 4.1 from |GU07| adapted to our 
situation. 

Let 6,c e M+ and let 5(0; 5) := {a; € R'' : \x\ < b}, S%0;b) := G M'* : \x\ > b} and S{b;c) := {x e M'* : 
6 < < c}. Since /i and i> are absolutely continuous with respect to the Lebesgue measure, the functions 

t Mls(0;t)' * ^^15(0;*), * Mls<=(0;t), t h-^ l^|s=(0;t) (3-18) 

are continuous. 

Step 1. We assume first that there exists b € IR+ such that supp fi C S'(0; b) and supp v C 5''^(0; 6). Then, 
because of p.lSp . we may choose ei, £2 > such that 

/^(5(0; b ~ ei)) = v{S(b; b + £2)) = \ with £1 < b. 

Let 

M~ = Mls(0;fc-ei), M"^ = M|s(6-£i;6), V~ = l/|s(b:b+£2) and = v\sr- l^n-.h+e^)- 

Set 

7 := 2(^" /i+ ® 

Note that 7 e r(/i, v) and for all (x, y) G supp 7, we have 

— y| > min{£i, £2} > 0. 

Step 2. Assume that /x and v are arbitrary in 7'(K''). We use p.lSp to choose b G such that 

y.{S{Q\b)) ^ v{S\Q;b)) ^ m. (3.19) 

If 771 = 0, then we reduce the discussion to Step 1. Similarly if 777 — 1. We can therefore assume that 
< 777 < 1. More precisely, if we denote by f(b) -.^ ^(5(0; 6)) - 7^(S"=(0; b)) for all b G M+, then f{b) is an 
increasing and continuous function of &, going from negative values to positive ones as b goes from to +00. 
Therefore, there exists &o G M+ such that /(60) = Oi which is equivalent to /-t(5(0; &o)) = v(S'^i^\ fep)). Set 

A*" = Mls(0;f.o)' A*^ = A'ls<=(0;bo)' = ^U(0;bo) ^nd 7^+ = 7^|s-(0;&o)- 

By p.lQp . ^ and are probability measures. They satisfy 

supp yT C S'(0; b^f) and supp C 5^(0; 6o)- 

Therefore, they satisfy the assumptions of Step 1, so there exists (5i > and 71 G r(^, — ) such that for 
all (x, y) G supp 71, we have 

\x-y\> 81. 

Similarly, there exists (52 > and 72 G r(j^^, ) '^^ch that for all (x,?/) G supp 72, we have 

V ^ y| > 



Set 7 := TO71 + (1 — m)j2 - Then 7 G r(yu, v) and for all (x, y) S swpp r(/i, we have 

\x -y\> min{(5i, ^2}- 

□ 

Theorem 3.19 (existence of optimal measure with c-cyclical monotone support). Assume that c{x,y) ~ 
h(x — y) := l{\x — y\) satisfies (Al)~(A4) and let fiji^ G V{M.'^) be absolutely continuous with respect to the 
Lebesgue measure. Then there exists a measure 70 with c-cyclically monotone support which minimizes the 
functional C {^) introduced in \3.4^ overT{pL^v). 

Proof. By Lemma [3. 181 there exists 7 G r(//, v) and e > such that 

\x~y\>e (3.20) 

for all x,y G supp 7. Since I is strictly decreasing on (0, 00), p.20|) together with (Al) ensure that c is 
uniformly bounded from above on supp 7 by ^e). This proves that C[j] < 00. The statement follows now 
immediately from Theorem 13. Ill □ 



3.3 Geometrical characterization of the optimal measure 

It is well known that a set is cyclically monotone if and only if it is contained in the subdifferential of a 
c-concave function; this result was proved for general cost functions c : X xY ^ M. in [SK92| . The following 
theorem is a further extension that is needed to deal with cost functions which satisfy (A1)-(A3) (and are 
allowed to take the value +00 somewhere). 

Lemma 3.20. Suppose that X,Y cM.'^ are closed sets. 

(1) For S d X X Y to be c-cyclically monotone, it is necessary and sufficient that S C d'^ip for some 
c-concave i/) : A" — > M U {—00}. 

(2) Suppose that S G X x Y is c-cyclically monotone and contains two pairs (xQ,yQ),{xQ,yo) such that 
xo,yo ^ {xo,yo}. Let : R"^ ]RU{— 00} be the c-concave function from (1). Then 

(2a) ipi^o) and iP{xq) are finite, 

(2b) whenever {x,y) € d'^^p, we have that ip^x) > —00 and x ^ y, 
(2c) we have fi-a.e. 

ipix) = mi {c{x,y) - ip''{y),x ^y}, x e X. 

Proof. Part (1) has been proved in Theorem 2.7 of jGM96| or Lemma 2.1 of |Ru96j . 
(2) As in the proof of Lemma 2.1 in |Ru96j . we define 

{n— 1 n I 

c(a;,2/„) + Vc(xj+i,yj) - Vc(a;j-,?/j) i , x G X. (3.21) 

(2a) The construction above yields a c-concave on X with S C d'^tp. Standard arguments (see for example 
jRu96| ) give that d'^ip is c-cyclically monotone in the sense that 

n n— 1 

^c{xj,y-j) <^c{x„+i,y„) -\- c{xn-i-i,yn) with Xn+i = xq. (3.22) 
j=o j=o 

and so i^{xo) > 0. Taking n = 1, xi = xq and yi = yi in p.2ip gives that ij{xo) < 0. We conclude that 
ip{xo) = and so -0 is finite and not identically —00. Note that by construction, -0 is the c-concave function 
from (1). Recall now that (x, y) € d'^ip is equivalent to 

c{x,y)-i;{x)<c{z,y)-ij{z), z e X. (3.23) 



Now recall Reinark r3.16l Setting = {xo,yo), z = xq in p.23p . using the fact that ip{xo) = and that 

xq 7^ j/Oi we obtain that ^'(^^o) is finite. 

(2b) Next, if (uo,y) G S, setting z — uq, we have that 

c{x, y) - ip{x) < c{uo, y) - iP{uq). (3.24) 

If 2/ 7^ ccq, we set uq = xo in p.24p to obtain the claim. If y = ccg, we set uq = xo and we use the fact that 
tjj{xo) is finite, to obtain the claim. 

(2c) This representation is a simple consequence of the construction. □ 

We have proved the following. If we define 

d'oi^:={{x,y)ed^^\xy^y} and d^oH^) := d^ij{x)\{x}, (3.25) 

then OqiP ~ d'^ip /i-a.e. and we will focus on the off-diagonal elements from now on. We also assume for the 
rest of this chapter that X = F = R^. 

Lemma 3.21 (//-a.e. differentiability of c-transforms). Let c and I satisfy (Al)-(A^). Then the function ip 
from Lemma \3.20\ is fi-a.e. differentiable onW^. 

Proof. Recall that 

tp{x) = mi{c{x,y)-'4;''{y),x^y}. 
We will prove that tp is ^u-a.e. differentiable on R'*. 

Step 0. Let r > be such that i/j takes finite values at two or more points in 5(0; r) (this is possible due 
to Remark 13.161 and Lemma [3?20]) . Let < a < r arbitrarily fixed, which means that S{0;a) C S{Q;r). We 
will show in Steps 1 — 3 below that tp is ^-a.e. differentiable on S{0; a), from which we will derive in Step 
4 the corresponding property on R''. The reason for the choice of an < a < r such that S{0; a) C S{0; r), 
will become apparent in Step 2 below. 

In order to prove that "0 is /x-a.e. differentiable on S{0; a), take x G ^(O; a) arbitrarily fixed. Then 

ipix) =inm{i>i{x),tp2ix)}, 

where 

ipi{x) = inf {c{x,y) - ij''{y),x ^ y} and ■02 (a;) = inf {c{x,y) - ij''{y),x ^ y}. 

yeS(0;r) i/eS=(0;r) 

Due to the fact that ip takes finite values at two or more points in S{0; r), it follows from the definition that 
ipi and ip2 also take finite values at two or more points in S{0;r). 

Step 1. ■01 is locally Lipschitz and semi-concave on S{0;a): 

As 01 takes finite values at two or more points, the proof follows the same reasoning as the proof of 
Proposition A. 6 in |KM07| and will be omitted. Note also that by Proposition C.6 of (GM96j . differentiability 
of V'l can only fail on a set of /i-mcasure zero. 

Step 2. ijj2 is locally Lipschitz and /x-a.e. difTerentiable on S{0;a): 

Let 6 := r — a > 0. Define ^ < by using the right derivative 2d£, := l'{S~^) of I at S. Then the function 
ls{X) = ^(A) — ^A^ is strictly convex on [S, oo) and non-decreasing since l's{S~^) = 0. Extend this function to 
A < (5 by making ^^(A) constant-valued there. Then hs{x) := ls{\x\) will be convex on R**: taking x,y gR'^ 
and < t < 1 implies 

hs{{l - t)x + ty) < h{{l - t)\x\ + t\y\) < (1 - t)hs{x) + ths{y). (3.26) 

Note that h{x) = hs[x) -\- whenever \x\ > S. Take x G S{Q;a) and y G S"^(0;r); then \x — y\ > S and 
the definition of V'2 yields 

Mx)~^\xf= inf lhsix-y) + 2^<x,y>+Cy^^r{y)}- 

yeS'=(0;r) I J 



Note now that hs satisfies conditions (H1)-(H3) of |GM96| . Since I is continuous on (0,oo), we can apply 
Theorem 3.3 of jGM96| to hs and ^2 (see also Proposition C.2 in jGM960 and thus, ^2(2;) - will be 
locally Lipschitz. Using the fact that '4'2{x) — CI^^P is locally Lipschitz, Radcmacher's theorem shows that 
the gradient V'ip2 is defined fi-a.e. everywhere on 5(0; r). 

Step 3. tp is fjL-a.e. differentiable on S{0;a): 

Since ip{x) = min {Tpi{x) , ip2{x)) , the /i-a.e. differentiability of ip on S(0]a) follows immediately as ip is the 
minimum of two /i-a.e. differentiable functions. 

Step 4. i/j is /x-a.e. differentiable on Mf^: 

Let (a„)„gN be an increasing sequence of positive real numbers tending to infinity as ?7 — > 00 and M'* = 
U„gN5'(0;a„). Let A {x e M'' : -0 is differentiable at x}. Then fi{A) lim„^oo ^ 5(0; a„)). The 
statement follows now immediately by means of Step 3. □ 

The following is a version of Lemma 5.2 of |GM96j for strictly convex and decreasing I. 

Lemma 3.22 (the c-superdifferential lies in the graph of a map). Let c satisfy (Al)-(A4). Suppose that 
ip : — > M is differentiable at some x S Mf^ . Then y £ d^'tplx) implies that h* is differentiable at V^/'(x) 
and that y ~ x — V h* {V ^{x)) . 

Proof. We recall first that by (Al), V/i is injective off the diagonal; the injectivity of Vft. off the diagonal 
is crucial for the argument of this lemma, as will become apparent below. The proof follows similar steps 
as the proof of Lemma 5.2 in |GM96j . Let y S dQip{x). Then x ^ y, so h is differentiable a.t x — y. From 
Lemma [3. 141 (sec also Lemma 3.1 of |GM96| ) we have that d.%p{x) G Vh{x — y). As ip is differentiable at x, 
we have d.ip{x) — Vtp{x) and V'(/;(a;) = V/i(x — y). Since x ^ y and since I is strictly convex and strictly 
decreasing, the gradient \i'tp{x) does not vanish /x-a.e. Lcmma lX4l (iiV(iii) implies both {\/ip{x), x — y) £ d.h* 
and differentiability of h* at Vipix), whence Vh*{y-4>{x)) ^ x ~ y. □ 

Lemma 3.23. Let c and I satisfy (Al )-(A4) and let fijiy G 'P{M.'^). Suppose that a joint measure 7 G r(/i, ly) 
has supp 7 C OqiP = d'''ip, where ip : M."^ — > i? js the c-transform of a function on supp v. The map 
T[x) :~ X — Vh*{'S/iJj{x)) pushes ji forward to v. Ln fact, 7 = (id, T)^fj. and T^^i = v . 

Proof. The proof follows the same steps of Theorem 5.4 from |GM96j . For the reader's convenience, we 
provide the reasoning below. 

To begin, one would like to know that the map T{-) is Borel and defined n-a.e.. By Lemma r3.21[ differentia- 
bility of can only fail on a set Af of /.(-measure zero in M*^. Thus /i(A/') = 0, 7(A^ xY) = and so the map 
\/ip is defined /i-a.e.. Since by Remark l3.161 7(A) = and since supp 7 C = dQijj, we have "/(DqiP) = 1. 
Therefore, define S := {{x,y) € dQtp\x G domVip}, where domWip denotes the subset of R'' on which tp is 
differentiable. Lemma 13.211 shows that ip is /i-a.e. differentiable on dom ip. Since its gradient is obtained as 
the pointwise limit of a sequence of continuous approximants (finite differences), \7ip is Borel measurable on 
the (Borel) set dom Wip where it can be defined. Lcmma fA. 41 shows that Vh* is a Borel map. For {x, y) G S, 
Lemma [3.221 implies that T is defined at x and y = T{x). Thus T is defined on the projection of S onto 
by TT{x,y) := x; it is a Borel map since Vip and V/i* are. Moreover, the set tt{S) is Borel and has full 
measure for fi: both OqiP and Tr{d§tp) are cr-compact, so 7r(5) = 7r(5o'0) n dornS/ip is the intersection of two 
Borel sets with fuU measure. Thus ^{Zf^S) = ^{Z) for Z C M'^ x W^. It remains to check that (id, T)#// ~ 7, 
from which T#/i = v follows immediately. 

It suffices now to show that the measure {id, T)^fi coincides with 7 on products UxV of Borel sets U,V E M.''-; 
the semi-algebra of such products generates all Borel sets in M'^xK.''. Defined := {{x,y) G dQtjj\x £ domVip}. 
Therefore, since y = T{x) if {x,y) G S, we have 

{u xV)r\S ^ {{un t-\v)) x m'*) n s. (3.27) 

Being the intersection of two sets having full measure for 7 - the closed set OqiP and the Borel set dom Wtp x M'* 
- the set S is Borel with fuU measure. Thus 7(Zn 5) = -/(Z) for Z cM.'^x W^. Applied to ((XTf)) . this yields 

7(J7 xV) = 7((C/ n T-\V)) X M"^) fi{U n T-\V)) = (id, T)^fi{U x V). 



7 € r(/i, :/) implies the second equation, Definition 13.61 implies the third. □ 

Remark 3.24. Note that by Lemma \3.20\ and Lemma \3. 22[ we have 

//({.T e R'' : T{x) = x}) = 0. 

Theorem 3.25 (geometric representation of optimal solution). Assume that c and I satisfy (Al)-(A^), and 
let fJi^v ^ ViM."^) be absolutely continuous with respect to the Lebesgue measure. Then the following hold: 

(a) There exists an optimal measure 7 G r(/i, v) with c-cyclical monotone support; 

(b) For any such 7, there is a function ip : M.'^ — > R which is the c-transform of some function on supp 7 
such that the map T id — V/i*(V'!/') pushes fi forward to v and satisfies 7 = [id,T)^^. 

(c) "f := (S X id)^v for some inverse map S : R'' — > . 

(d) S(T{x)) = X fi-a.e., while T{S{y)) — y v-a.e. 

Proof. For completeness, we sketch the arguments used in |GM96j . 

(a) and (b) The existence of an optimal 7 with c-cyclical monotone support is guaranteed by Theorem l3.19l 
Then supp 7 C d^'ip. The map Tr{x,y) = a; on K'' x K"^ pushes 7 forward to /i = tt^j while projecting the 
closed set OqiP to a cr-compact set of full measure for jj.. From Lemma [3. 211 we know that ip is differentiable 
/x-a.e. Lemma 13.231 shows that T{-) pushes ^ forward to ly while 7 coincides with the measure (id,T)jf,^,. 
The proofs of (c) and (d) follow the same reasoning as the proofs of (iv) and (v) from Theorem 4.6 in jGM96| 
and will be omitted. 

□ 

3.4 Uniqueness of the optimal measure 

Lemma 3.26 (c-superdifferentiability of c-transforms) . Let c and I satisfy (Al)-(A4) and let V dW^ be a 
closed set. Let ip : R'' — ^ R 6e the c-transform of a function on V and suppose that T :— id — Vh*(S/ip) can 
be defined at some p E M.'^ (i.e., ip is differentiable at p and V/i* exists at \/il}{p)). Then dQip{p) = {T{p)}. 

Proof. The proof follows similar arguments as the proof of Proposition 6.1 of |GM96| . From Lemma [3.221 
it is clear that dQtp{p) C {T{p)}. Therefore, we only need to prove that is non-empty. Assume that 

T{p) is defined for some p E R'^. By c-concavity of ip, there is a sequence {yn, ctn)^^i C A C V x M. such 
that 

ip{p) ^ lini [c(p,2/„) + a„]. (3.28) 

As is shown below, (|yn|)^i must be bounded. We first assume this bound to complete the proof. Since 
{yn)^=i is bounded, a subsequence must converge to a limit y„ — !■ y in the closed set V. On the other hand, 
y € d^i:{p) since for all x€W^, ([XTT]) and ([5:^ imply 

i^{x) < inf {c(a;, y„) 4- a„} < c{x, y) + il^ip) - c{p, y), 

neN 

with both 'ijj{x),ip{p) > —00, as shown by Lemma 13.201 (2b). Thus, p ^ y and y G dgip^p). It remains only 
to prove that the sequence (|yn |)^i is bounded, which means that we can extract a convergent subsequence 
that converges to a point y E Y. To produce a contradiction, suppose that a subsequence diverges in a 
direction y„ ^ y. Then |p — yn| is bounded away from zero by (5 > 0. Since for each arbitrary small x S R'* 
we have |p — y„ — .t| > 0, it follows that V/i exists at p — y„ — x. More precisely, for each n the uniform 
subdifFerentiability in Lemma lA.31 gives 

Hp - Vn) > Hp ~yn- x)- < x, w„(x) > +Os{\x\'^), where w„(x) = — — — ^l'{\p - y« - x\) (3.29) 

\P - Vn - x\ 



for arbitrary small x G R'' and O^dxp) independent of n. We re-write 




V 



{\p-yn - x\) 
\p - yn - x\ 



< x,x > 



|p-2/ri| \p-yn-x\ 



^\xfl\\p-y^-x\) + Os{\xf). (3.30) 



Recall now that the derivative of I is negative and increasing and, therefore, y„ ~ < \l'{{5 + \x\)). 

The sequence (|y,i|)^i can only diverge if — ?/„ — x\)\ tends to |Z'(oo)| := infA|/'(A)|. Therefore, 



where the large n limit has been taken using p.28p . By taking z = ~x in the above equation, we get 

tp{p)+ < z,wi > +Osi\x\) > ip{p + z), where wi := — wZ'(oo) and |wi | = |^'(c>d)|. 

Thus wi € d'ip{p). On the other hand, differentiability of V' at p implies 9 = {VV'(p)}, whence 
wi = Vil){p). Now {wi,p — T{p)) <E dh* follows from the definition of Tip). Assume now that p = T{p). It 
follows that (wi, 0) g dh*. If h* is non-constant. Lemma [A. 21 gives (— 0) € d.l ~- a. result which is obvious 
when h* is constant. Thus (— 0) G d.l) by Lemma fA.ll fi). This conclusion contradicts the fact that I is not 
differentiable at 0. Therefore T{p) 7^ p and {\/h){p-T{p)) = wi. Lemma [Ogives {\p-T{p)\,-\wi\) € d.l. 
Since I is strictly convex, > |/'(oo)|, which produces a contradiction. Therefore, the sequence (yn)^i is 
bounded. □ 

Theorem 3.27 (uniqueness of the optimal map). Assume that c and I satisfy (Al)-(A^) and let ijl,v ^ 
ViM.'^) he such that they are absolutely continuous with respect to the Lebesgue measure. Then an optimal 
map T pushing pL forward to v is uniquely determined p-a.e. by the requirement that it is of the form 
T{x) = X — V/i*(V'0(x)) for some c-concave ip on M''. 

Proof. For completeness' sake, we will sketch the main idea of the proof, as given in Theorem 4.4 of |GM96j . 

We will prove by contradiction that T is unique. That is, we assume that there exists, in addition to T and 
-0, a second c-concave function ip' for which T'(x) := x — V/i*(VV''(x)) pushes fx forward to T^fi = = z/. 
Recall now that ip and ip' are p-a.e. differentiable. T and T' are defined /i-almost everywhere, and unless 
they coincide, there exists some p S M'^ at which both ip and ip' are differentiable but T{p) ^ T'{p). From 
this, it is clear that Wip{p) 7^ Wip'{p). 

Let U := {x e M.'^\ip{p) > ip'{p)}. A contradiction will be derived by showing that the push-forwards T^p = v 
and T'n^i = i^-alleged to coincide-must differ on V := d'^ip{U). We will show that 



The main ingredient necessary to prove the last equation is the fact that d^ipij)) = {T(p)}, which is proved 
in Lemma 13.261 By using this together with the fact that p is absolutely continuous with respect to the 
Lebesgue measure, the proof follows via the same arguments as Theorem 4.4 from jGM96| and will be 



3.5 Some general properties of the optimal measure and the optimal cost for 
equal marginals 

We will investigate in this subsection the case of equal marginals fi ~ v with common density p, and assume 
throughout that the cost function c satisfies conditions (Al)-(A4). We also introduce the optimal cost 




Combining p.30p with the definition of a c-concave function, this yields 

ipip) > 'ip(p - x)- < X, u; > I' {00) + Os(\x\), 



p{T~\V)) < p{U) < p{T'~\V)). 



omitted. 



□ 



TP'' 
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inf C[7]. 

7er(p,p) 



Eq'j^"^[p] corresponds to the non- normalized functional Eqt introduced in Section 3 via 

V2 j 



For the proof of Theorem 13.291 below, we will need the following lemma. 

Lemma 3.28. For any marginals fi and v, and any map T which pushes fi forward to v, /x, v and T satisfy 
the following equation 

H{x) = v{T{x))\det DT{x))\, 
where DT is the approximate gradient of T (for a definition see for example Definition 10.2 of \Vill09f ). 

Proof We have that /i(r^^(A)) = v{A) for all Borel sets A C M''. Then for any such A we have 

dv{y) ^ / (3.31) 

yeT{A) JxeA 

With the change of variables y — T{x) the left-hand side of (I3.3ip becomes 



di^iy) = / \det DT{x)\v{T{x))dx. (3.32) 

From (f33T|) and (|3?32|) . the claim follows. □ 
Theorem 3.29. Assume that ^ ~ v with common density p. Then 

(a) The optimal measure which minimizes C['y\ is symmetric, that is 

7t(A X B) ^'^t{B X A) for all Borel A,B e K.'^. 

(h) The optimal cost is strictly convex in p. 

(c) Let c{x, y) = \/\x ^ y\. Then for all a > we have the following dilation behaviour 

ii;S°T™[a'p(«-)] = ai?3T™[p(-)]- 

Proof, (a) Recall from Theorem 13.251 that "fx = {id^T)^p, where T is the optimal map which pushes p 
forward to i.e. p{T-^{A)) = v{A) for all A e B{W^). Then 7t(x,2/) p {{id,T)-^{x,y)) = ST(a;){y)Kx)- 
Using this we get 

jTiAxB)= f f 5T(x){y)p{dx)= j XBiT{x))p{dx), (3.33) 

JxGA J yeB JxeA 

where x denotes the indicator function. We now use Lemma 13.281 and the fact that p = v, then the right 
hand-side of (|3.33p becomes 

XB{T{x))p{dx) = / XB{T{x))p{T{x))\det DT{x)\dx ^ f XBiy)fi{y)dy 
xeA JxeA JyeT-^iA) 



XT-^A){y)Ky)dy = / XA{T{y))p{y)dy 

yeB JyeB 

-fT{B X A). 



(b) The convexity is immediate from the definition of EQ9f"^[p], and strict convexity fohows from uniqueness. 

(c) Fix any a > 0. Then 

E—ia'^pia-)] = inf / 

7er(a''p(a-),Q''p(Q-)) J \x~y\ 

= inf [ -a^'^"f[ax^ay)dxdy 

a2<'7(ai:,ay):7er(p.rt J \x - y\ 

= inf / ro^'^^iax, ay)dxdy 

= inf a / ■; :'y(x' ,y')dx'dy' 

= ai?3°T™[p(-)], 

where for the second cquahty we used the fact that for 7 G r(p, p) wc have J a^'*7(Q;x, ay)dy — a^^ p{ax) and 
for the penuhimate equahty we used the change of variables x' = a'^x, y' = a'^y. □ 

Remark 3.30. (0) Trivially, the statements in Theorem \3.29\ also hold for the non-normalized functional 

EoT- 

(1) Recall from Lemma \2.1\ that the exact energy interaction is of form 

= / 1 — ^ — :P2{x,y)dxdy, 
Jr6 \x~y\ 

with the exact pair density p2 being symmetric due to the antisymmetry condition on the underlying in 
h2.^) . Property (a) of Theorem \3.29\ shows that the approximate interaction energy 

opt. 



EoT[p]^ I ^p^dxdy, 
Jrb \x - y\ 



\x - y\ 

is of the same form, with the arising p°2* being automatically symmetric as a consequence of optimality 
coupled with the weaker symmetry condition that p"^*' has equal marginals. 

(2) Property (c) of Theorem \3.29\ is a scaling property of the exact electron- electron energy Vee not shared by 
many approximate density functionals used in the physics literature, such as the local density approximation 

(3) The dilation behaviour of Eqt equals that of the exact Vee, o,s well as that of approximations like \2.23\) . 



4 Explicit example - equal radially symmetric marginals 

As in the last subsection, we continue to investigate the case of equal marginals p ~ v with common density 
p : E** — )- [0,00). Moreover, we assume that p{x) > for all x £ supp p. We will also assume throughout 
that the cost function c satisfies conditions (Al)-(A4). 

Throughout this section, for any dimension d g N we will denote the optimal map by T'-'^\ In subsection 4.1 
we will explicitly compute T^^\ and in subsection 4.2 we use the one-dimensional analysis to explicitly 
compute T^'^'^ when p is radially symmetric, that is to say when p{x) = A(|x|) for all x £ and some 
function A. 

As turns out, in the above situations the optimal map is universal with respect to all cost functions satisfying 
(A1)-(A4), but the fact that c{x,y) decreases with the distance \x — y\ is essential. 

4.1 Explicit solution for equal marginals in one dimension 

Let p = V £ be equal marginals on M. Moreover, we define / := supp p. Recall from Theorem 13.111 

that the unique optimal measure has c-cyclically monotone support. This will help us to characterize the 
optimal map T*^^-* in the following lemmas. 



Xi Yl y2 X2 Xi X2 y2 Yi 



Yl Xi X2 72 Yl Y2 X2 Xi 

Figure 1: Configurations excluded by Lemma l4.ll 



Lemma 4.1. Let {xi,yi) and {x2, 1/2) be two points in the support of the optimal map j , that is yi = T^-^^xi) 
and 2/2 = T'^^H^z)- The possible configurations (not counting the symmetries between (xi,yi) and {x2,y2)) 
are: xi < X2 < j/i < y2, xi < y2 < yi < X2, yi < X2 < xi < y2, yi < y2 < xi < X2, xi < y2 < X2 < yi, 
2/1 ^ < 2/2 < xi, xi < 2/1 < X2 < 2/2 and yi < xi < y2 < X2 (see also Figure]^ for the excluded 
configurations ) . 

Proof. If (xi,t/i) and (x2,2/2) are two points in the support of the optimal map 7, then 

0(2^1,2/1) + c(a;2, 2/2) < c(xi,2/2) + c(a::2,2/l)■ 
Let us consider the excluded cases one by one. 

(i) a;i < 2/1 < 2/2 < 2:2- 

Then, due to the fact that I is strictly decreasing, it follows that 

c(2;i,2/i) + c(a;2,2/2) > c{xi,y2) + c(x2,yi), 
which contradicts the c-cyclically monotonicity property of the optimal solution. 

(ii) yi<xi < X2 < 2/2- 
Similar to (i). 

(iii) xi < 0:2 < 2/2 < 2/1- 

We have 2/2 - X2 < y2 - Xi < yi - Xi. Therefore, 2/2 - a; 1 = i(2/2 - X2) + (1 - t){yi - xi) and 
2/1 — X2 = (1 — 0(2/2 — X2) + t{yi — xi), where t € [0, 1]. Thus, using the strict convexity of h, we have 

c(x2,2/i) + c(a;i,2/2) < ^(0:2,2/2) + (1 - t)c{xi,yi) + (1 - t)c{x2,y2) + te(xi,2/i) = c{xi,yi) + c(x2,2/2), 
which contradicts the c-cyclically monotonicity property of the optimal solution. 

(iv) 2/1 < 2/2 < 2:2 < xi 
Similar to (iii). 

□ 

Lemma 4.2. Assume {xi,yi)^ {x2,y2) G supp 7 are such that one of the following four configurations holds: 
xi < 2/2 < X2 < 2/1 or 2/1 < 2^2 < 2/2 < xi or xi < yi < X2 < y2 or yi < xi < y2 < X2. Then, if (2:3, 1/3) is 
another point in the supp 7, none of the following configurations are possible: Xi < Xk < yj < Xj < yk < yi, 
Xi < yk < yj < Xj < Xk < yi, yi < Xk < xj < yj < yk < Xi, yi < yk < x-j < yj < Xk < Xi, 
Xi < yi < Xj < yj < Xk < yk and yi < Xi < yj < Xj < yk < Xk, where i,j, k G {1, 2, 3} (see also FigureWi). 
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Figure 3: Possible configurations by Remark l44l 



Proof. Note that the first 4 configurations are immediately excluded by Lemma 14.11 Let us focus on the 
penultimate configuration. From the c-cyclically monotonicity property, we have that 

c{xi,yi) + c{x2,y2) + c{x3,y3) < c{xj,yk) + c(xi,yj) + c{xk,yi). 

But due to the fact that h is strictly decreasing, we have that c{xi,yi) > c{xi,yj), c{xj,yj) > c{xk,yi) and 
c{xk,yk) > c{xj,yk), which gives rise to a contradiction. The last configuration can be dealt with in a similar 
way. □ 

Remark 4.3. Note that by Lemma \4-l\ "if {xi,yi), {x2,y2) and (xz^yz) G supp 7, with xi < X2 < X3 the 
following configurations are also not possible: xi < y2 < yi < X2 < X3 < ys and xi < y2 < yi < ya < X2 < 
X3. Similarly, the configurations yi < xi < X2 < ys < y2 < X3 and xi < X2 < yi < yz < y2 < x^ are not 
possible. 

Remark 4.4. From Lemma \4^.1\ Lemma |^.^| and Remark \4^.S\ it follows that the configurations fi-a.e. 
possible are of form: xi < X2 < yi < y2, xi < y2 < yi < X2, yi < y2 < xi < X2, yi < X2 < xi < y2, 
Xl < y2 < X2 < yi and yi < X2 < y2 ^ xi (see also Figure\^. 

Definition 4.5. We say that T^^^ has no points of decrease on A C I if ii{{x G A : 3x' € A,x' > 
x,r(^-'(x) > T^-^^x')}) = 0. We say that T^^^ has points of decrease on A with positive measure if ^Jl{{x G 
A : 3x' e A,x' > a;,T(i)(x) > T^^^x')}) > 0. We say that T^^) is fi-a.e. decreasing on A C I if 
^J■{{x G A : 3x' G A,x' > x,T^^\x) > T^^)(x')}) = 1. We define similarly for T'^^ the notions of points of 
increase on A and ^-increasing on A. 

Note that we can assume that the set B of such x' above such that T^^'^ has points of decrease (respectively 
points of increase ) on A with positive measure, is also a set of positive measure. Otherwise, if the set of 
such x' is of fi-measure zero, we may consider the set A\B, on which T^^^ is ii-a.e. decreasing (respectively 
fjL-a.e. increasing). 



a T(xy T(X2) a X2 (3 a Xi x'l a T(Xi) T(x'i) (3 

Figure 4: Optimal map configurations 



Lemma 4.6. The map T^^^ cannot be fi-a.e. decreasing on any subset A C- I . 

Proof. Asssume that T^^^ is /x-a.e. decreasing on a subset Ac_ I. Let (a;i,2/j)i=i = (^^j, ^(^^(xi))!^;^, where 
Xi € A for i g {1, 2, 3}. Recah now from Remark 14.41 the possible configurations by which T^^^ is decreasing. 

Assume first that xi < 1/2 < X2 < yi- Then the possibilities for {x3,y3) such that T*^^-* is strictly decreasing, 
are: X3 < xi < y2 < X2 < yi < j/s, xi < X3 < y2 < 2/3 < -^2 < 2/i, a;i < 2/2 < J/s < 2:3 < X2 < j/i, 
cci < 2/2 < 2^3 < 2/3 < 2^2 < 2/1, 2/3 < a;i < 2/2 < 2^2 < 2:3 < 2/1, 2:1 < 2/3 < 2/2 < 2:2 < X3 < 2/1, 
2/3 < xi < 2/2 < 2:2 < 2/1 < 2:3 and 2:1 < 2/3 < 2/2 < 2:2 < 2/1 < 2:3. In view of Lemmas 14.11 and I4.2i each 
these possibilities can only happen on a set of fi-a.e. measure 0. The case 2/1 < 2:2 < 2/2 < xi can be treated 
similarly. 

Assume next that xi < 2/2 < 2/i < 2:2. Then the possibilities for {x^, 2/3) such that T*-^) is strictly decreasing, 
are: 2:3 < xi < 2/2 < 2/i < 2/3 < 2:2, 2:3 < xi < 2/2 < 2/i < 2:2 < 2/3, 2:1 < 0:3 < 2/2 < 2/3 < 2/i < 2:2, 
xi < 2/2 < 2:3 < 2/3 < 2/1 < 2:2, xi < y2 < 273 < X3 < yi < X2, 2;i < 2/2 < 2/3 < 2/i < 2:3 < X2, 
2/3 < xi < 2/2 < 2/1 < 2:2 < 2:3 and xi < 2/3 < 2/2 < 2/1 < 2^2 < 2:3. In view of Lemmas 14.11 and 14. 2| each these 
possibilities can only happen on a set of /i-a.e. measure 0. The case yi < X2 < xi < 2/2 can be treated in a 
similar way. □ 



Remark 4.7. We have supp 7 = / x I; in particular, Im{T^^'^) 
image of T^^K 



I, where we denoted by Im{T^^'>) the 



Proof. Note first that 7(1 x I) ~ 7(1 x R) = ~ 1. Let us now assume that supp ^ ^ I y. J , with 
/, J C R and J C /, with /^(/ \ J) > 0. Then 1 = 7(7 x J) = 7(7 x R) = ^(/) = 7(R x J) = /^( J), which 
contradicts the definition of the support / of the marginals. □ 

For the proof of the next theorem, we will use the results of Theorems 13.251 and 13.271 in particular, we will 
use the properties of the map T^^\ as given in those two theorems. 

Theorem 4.8. Let a,/3 G R with a < (3 and let I = [a, /?]. There exists a £ I such that T^^-* is ^- 
a.e. increasing on [a, a) and fi-a.e. increasing on (a,/3], with T^-^^a) = T^^\j3) = a, T^^\a^) ~ j3 and 
r^^-'(a+) = a, with discontinuity at a. Except on a set of fj. -measure zero, we have 

(a) For all xi,x[ e {a, a) with xi < x[, we have xi < x[ < T^^'>{xi) < T'-^^x'i), with T^^^x) G {a, (3); 

(b) For all X2, x'2 G (a, /?) with X2 < x'2, we have T'^^^(x2) < T''^\x2) < X2 < x'2, with T'^^^{x) S (a, a); 

(c) For every interval {I1I2) ^ I, we have T{{li,l2)) = (fi,f2) Q I- 

Moreover, a is such that //([a, a]) = /i([a, /3]) = ^ (see also Figure^. 

Proof. Recall first from Remark 13.241 that 

^({a-eR'^:T<^)(a;)=;r})=0. 

Recall also from Theorem 13.251 that T is a bijective map; in particular, £ I : 3y £ I \ {x},T^^\x) = 
tW(2/))=0. 



step 1. T^^^ cannot be fi-a.e. increasing on I: 

Assume that T^^^^ is fi-a.e. increasing on /. Then T(i) is fi-a.e. strictly increasing and it can only be 
increasing as described in Remark 14.41 Suppose first that for ^-a.e. all xi,X2 G /, we have xi < X2 < 
T(i)(a:i) < T(i)(a;2). Then for each x € [a,/3], we have two possibilities: T^'^\a,x) = {a,T^'^\x)) or 
T(i)(a,x) C (c,r(i)(x)), with a < c. lfT^^'>{a,x) = {a,T'^^\x)), using fi = fio(T^^'))-^ and the fact that 
p > 0, we get that T^'^\x) = x fi-a.e., which contradicts Remark [3?24l If T^'^'>{a,x) C {c,T'^^\x)), then 
r(i)(a,/3) C (c,T(i)(/3)). Using again = ^o(r(i))-\ we get that fiia,l3) = 1 < /x(c, T^i) (/?)), where a < c, 
which would contradict the definition of supp /i — [a, (5]. The case with T(i)(a;i) < T^^\x2) < xi < X2 for 
fi-a.e. all xi,X2 G /, can be discounted the same way. 

Step 2. T^-*^) cannot have both points of increase and points of decrease on every interval A C. I 
with positive measure: 

Assume that there exists an interval AC/, such that T^^^ has both points of increase and points of 
decrease on A with positive measure. Take an arbitray point xi G A such that 3x'i € A,Xi > xi and 
T^-^\x'i) > T'^^^xi). By assumption, for a subset of such xi (and for a subset of x'^) in A of positive measure, 
3x2,^2 € ixi,x[) with x'2 > X2 and T^'^^x'^) < T'^^'>{x2)- Assume now that xi < x[ < T^'^^Xl) < T'^^'>{x[). 
Due to the /^-a.c. possible configurations as given by Remark |4.4[ x'2 is such that r'^-'^'(x2) < X2. Then the 
c-cyclical monotonicity of the optimal support fails for {x'i,T'^^\x'i)) and (xj, T^^)(x2)). If wc denote by B 
the set of such x'l, we have ^{B,T^^\B)) = ^i{B) > 0, which contradicts the assumption on the optimal 
support. The case with T^^)(xi) < T^^^x'^) < xi < x[ can be treated similarly so its proof will be omitted. 

Step 3. There exists {a.,a.i), (/3i,/3) C /, with cci < /3i, such that /x-a.e. for all Xi E (q:,q:i) we 
have T^^^(xi) > Xx and fi-a.e. for all X2 E {f3i,(3) we have T'--'-\x2) < X2: 

Recall first the possible configurations, as given by Remark 14.41 Note now that in view of Step 1, Step 2 
and of Lemma 14.61 there exists (a, ai), (/3i, (3) C /, with ai < j3i on which T^^-* is an increasing function. It 
remains to show that the optimal map can only be such that /i-a.c. for all xi € (a, ai), we have T'^'(xi) > xi 
and /x-a.e. for all X2 G (/3i,/3), we have T(^^(x2) < X2. 

Let us consider the alternatives one by one. Suppose to begin with that p.{xi G (a, ai) : r(^^(xi) < xi) > 
and ^(x2 G : X2 < T(i)(x2)) > 0. By Lemma liTTl if xi G {a,ai) with T'^^\xi) < xi and X2 G (/3i,/3) 

with X2 < T(^'(x2)), c-cyclical monotonicity of the support fails for (xi, r'^^'(xi)) and (x2, r'^^^(x2)). Assume 
next that /i(xi G (a, ai) : T(i)(xi) < xi) > and /u(x2 G (/3i,/3) : T''^\x2) < X2) > 0. In view of Step 1, 
of Lemma [4.61 and of Lemma [4.71 there exists then some subset {a'i,P'i) C (ai,^i) such that with positive 
measure, T^-'^' has both points of increase and points of decrease on {a[,l3[). But this contradicts the 
conclusion of Step 2 and therefore our assumption has to be wrong. The case with p{xi G (a, ai) : xi < 
r(i)(xi)) >0and//(x2 G : X2 < T(^)(x2)) > can be reasoned similarly, so its proof will be omitted. 

Step 4. T(i)(a) > TW(f3): 

Assume that T(i)(a) < r(i)(/3). By Step 3, there exist (a, a') C {a,T^^'>{a)) and C (r(i)(/3), /3) on 

which T(i) is as described in (a) and (b). Then, if xi G {a, a') and X2 G (/3',/3), c-cyclical monotonicity 
of supp 7 would fail for (xi, r'-^^(xi)) and (x2, r'-^-'(x2)), as shown in Lemma B?T] (i). Therefore, r''^'(a) > 
T(i)(/3). 

Step 5. There exists b E I such that T^^^ is as described in (a) and (b), for (o;, b) C / and for 
{b, f3) C /, respectively: 

Note that by Lemma [4.61 Step 1 and Step 2, T^^^ has to be fi-a.e. increasing on a certain number of sub- 
intervals of (ai,/3i). On any such sub-intervals, either r(^)(x) < x fi-a.e. or T^^^x) > x ^-a.e. In both these 
cases, due to the form of T*^^) on (a, ai) and {a, ai), as proved in Step 3, the c-cyclical monotonicity of the 
support would fail on a set of positive measure unless /i((ai, = 0. 

Step 6. For every interval (^1/2) ^ we have T{{lx, I2)) — (Ti,r2) C /: 

This is a simple consequence of Step 5 and of Remark 14.71 

Step 7. b = T(i)(q;) = T(i)(/3), T(i)(6_) = f3 and T(i)(6+) = a: 

Note first that T(i)(5_) = (3 and T'-^\b+) = a or else (a, a) and (a, /3) will be mapped into a smaller interval 



than J, which contradicts Remark 14.71 By Step 4, we have T^^'>{a) > T'^^^(^). It remains to prove that b = 
r(i)(a) = r(i)(/3). If this does not hold, the alternatives are: T^^^a) >b> T'^^\i3), T^^^a) > T^^^P) > b 
and T(i)(/3) < T^^^a) < b. We wih only show the reasoning for the case T^'^^a) >b> T^^\l3), as the other 
two possibilities can be dealt with in a similar way. In this first case, we map (a, b) to (r(i)(a),/3) and (&, /3) 
to (a,T(i)(/3)). Therefore, supp j C I x ((a, T(i)(/3)) U (T(i)(a),/3) x I, which contradicts Rcmarki!?! 

Step 8. fi{[a,a]) = /x([a,/3]) = 

(a, a) is mapped into (a,/3) and (a, /3) is mapped into (a, a). Therefore 

7((a, a) X (a, ^)) + 7((a, /3) x (a, a)) = 1. 

But 

j{{a, a) X (a, /?)) = 7((a, a) x /) = fi{{a, a)), 
as 7((a, a) x (a, a)) = 0. Similarly, 

7((a,/3) X (a,a)) = /i((Q;,a)). 
It follows that 2fi{{a, a)) = 1 or /i((a, a)) = \- □ 

Theorem 4.9. Assume that p, ~ v with density p{x) > on I = [a,/?], where a,/3 G RU {±oo}. Let 
^i(x) := ^((a,x)), /ii := n{{x,a)) for x € (a, a), p,2ix) := p.{{x,l3)) and p.2 ■= p.{{a,x)), for x G (a,^). // 
X G (a, a), we /lawe T(-'^)(a;) = p,2^ {/J'lix)) and if x G (a, we have T^^\x) = fii^{p,2{x)). 

Proof. We will use the fact that /i o T'-'^' = /i to find T^^^ . Let x G (a, a). Then due to the properties of T'^^ 
from TheoremgiHl it follows that T'-^\{a,x) = (a,T(i)(x)). Therefore, 

= MK^-)) - M(a,r(i)(x)) = M2(r(i)(x-)). 

We know that p{x) > 0. Due to the fact that T'-^^x) is increasing on (a, a) and with T(^^(a_) = /3, 
p2iT'^^\x) is a a strictly increasing function. We can take inverses and have 

tW(x)=M2-1(a.i(x)). 

A similar reasoning holds for a; G (a, □ 



4.2 Equal radially symmetric marginals in dimension d 

We assume in this subsection that the marginals fi and v are radially symmetric and in ■p(R'^). As before, 
we suppose that the cost function is given by c(a;, y) — t{\x — y\) > 0, with c and £ satisfying (A1)-(A4). 

Theorem 4.10. Suppose that p = v, with common density p{x) = A(|x|) for all x G supp p. Moreover, we 
assume that p{x) > for all x G supp p. Then the optimal transport map T^"^^ has to be radially symmetric 
itself, that is 

rW(x)=g(|x|)^, xGR'', (4.1) 
\x\ 

for some function g : [0,oo) — > R. Moreover g < 0, and g is an increasing function with ^(0+) = — oo and 
g{+rx)=0. 

Proof. Step 1. T{Rx) — RT{x) for all R E 0{d) and all x 6 supp fi: (Here 0(d) denotes the group 
of orthogonal d x d matrices): 

Let be a minimizcr of C on T{p^p). Then (i? x i?)j7 is also a minimizcr, for any R G 0{n), since it 
belongs to r(p, p) by the radial symmetry of p, and has the same cost C as 7^ by the invariancc of the cost 
function c{x,y) under {x,y) 1— > [R^^x, R~^y). Hence by uniqueness, ~ {Rx R)^^t- But an elementary 
calculation shows that the latter is equivalent to T{z) = RT{R~^z) for all z G supp p. Left, respectively right 
hand side, evaluated on a set Ax B give / XA{x)xB{T(x))p(x)dx respectively / XA{Rx)xB{RT{x))p{x)dx. 



A change of variables together with the radial symmetry of p shows that the latter expression equals 
/ XA{z)xB{RT{R~^z))p{z)dz. Comparing with the former expression yields the assertion. 

Step 2. T is radial and direction reversing, i.e. T(x) — gVixV^j^ for some g < 0: 

\x\ 

Let ei be a fixed miit vector in and r > 0. By Step 1, for all R E 0(??) wc have 

T{Rrei) = RT{rei) = R,f{r)v{r) with /(r) := |T(rei)| and v{r) := r(rei)/|T(rei)|. (4.2) 

Hence 

I[T]= j l{\x-T{x)\)p{x)dx = J i{\x\e,- fi\x\)vi\x\)) p{x)dx. 

But £ is by assumption strictly decreasing and \rei — f{T)v{r)\ is maximized among unit vectors v{r) if and 
only if v{r) = — ei. Hence, since T minimizes /, v{r) = — ei. Substituting into (|4.2I) yields the assertion, 
with g{r) = -f{r) = -|T(rei)|. 

Step 3. g solves a one-dimensional mass transportation problem: 

For any Borel map T on R*^, abbreviate I^[T] :~ J £{\x — T{x)\)p{x)dx (Monge functional with map T and 
equal marginals p{x) ~ A(|x|)). If T is a radial map, i.e. of form T{x) — g{\x\)-^ for some Borel g : [0, oo) — > 
M, and g denotes the antisymmetric extension of g to M, such that, in particular, T(x, 0, 0) = {g{x), 0, 0) for 
all a; € E, then using polar coordinates (with |S''*~^| denoting the Hausdorff measure of the unit sphere in 

M^) 

Ip[T] = / e{\r^g{r)\)\S''-'\r''-'X{r)dr 

Jr=0 

£{\s - ~g{s)\) ^IS'-'Wsf-'XM ds = I,, [T]. 

=-Pi{s) 

Hence the d-dimcnsional Monge problem of minimizing Ip over radial maps is equivalent to the one- 
dimensional Monge problem of minimizing /p^ over antisymmetric maps, and - because of Step 1 (with 
d=l and R = —I) - to the one-dimensional Monge problem of minimizing /p^ over arbitrary maps. It 
follows that the function g in (|4.ip . antisymmetrically extended to K, is a minimizer of /p^. The asserted 
properties of g now follow immediately from Theorem l4.8l and the fact that pi (being symmetric) has median 
0. □ 

Corollary 4.11. Suppose that p = y are as in Theorem \^.10\ Let t e (0, cxj) and denote by 

Fi{t) = \S'^-^\ / X{s)s'^-^ds and F2{-t) = \S'^-^\ / X{s)s'^-^ds. 
Jo Jt 



Then the function g in (|4.1|) is given by 

git) = F-\F^{t)). 

Proof. We have already shown that g, antisymmetrically extended to R, minimizes the one-dimensional 
functional /p^ , with pi as in Step 3 above. The assertion is now a direct consequence of the representation 
formula given in Theorem 14.91 □ 

Example 4.12. (exponential radially symmetric distribution) Assume now that p{x) = for x = 

{xi,2 tXs) G R"^, where Z is the normalizing constant, that is, Z = J e~^^^dxidx2dx3. Then for t G (0, oo), 
we have ^ ^ 

F,{t)^l- fl + t+j\ e-\ F2{-t) ^e-'(\+t+ M and g{t) = F^\F^{t)). 



□ 



Figure 5: Optimal transport map T for the 
density p{x) = const x e~'^'. As shown in 
the text, T leaves lines through the origin 
invariant, e.g. T(a;, 0,0) = (5(2;), 0,0) for 
all X, and the figure shows the function g. 



Figure 6: Optimal transport map T for the 
density p{x) = const x e"'^' 



Example 4.13. (normal radially symmetric distribution) Assume now that p{x) — -g-e 1^' for x G 
where Z is the normalizing constant. Then for t G (0, 00), we have 



Fi{t) 



e-^'^^s^ds, F2{-t) 



e-^'^'^s^ds and g{t) = F^^{Fi{t)). 



□ 



5 Asymptotic exactness of the optimal transport functional in the 
semiclassical limit 

Our goal in this section is to compare the exact quantum mechanical ground state energy to the approximate 
DFT ground state energy obtained by replacing Vee by the optimal transportation functional. Recall from 
(|2.3p and (j2.4[) ') that the exact ground state energy of an A^-electron system is defined as 

E^'' - mf^ { Tm + Kjp*] + Ke[p|] } (5.1) 
and the approximate ground state energy is (recall the DFT formalism in (|2.15p . (|2.16p ) 

eDFT-ot ^ I ^ y^^^p^^ ^ EoAp''] I = inf {Tqm[p\ + Ke[p] + Eot[p\]. (5.2) 

In the above, and p* denote the pair density, respectively the single particle density of ^ (see (|2.5p . (|2.6p 
and (|2.7p ). and Eqt is the optimal transportation functional with Coulomb cost from p.3p 

Eot[p]= inf / —^d-i{x,y). (5.3) 

76r(p,p) 7]jj6 \x - y\ 

Due to the fact that is the marginal of , we have 

V,e[pt] > EotIp''] for every ^ e A. (5.4) 

Taking the infimum over 5* gives 

Theorem 5.1. For every N , and any potential v £ L'^^^{M.'^) + L°°(W^), the density functional with electron- 
electron interaction energy given by the mass transportation functional is a rigorous lower bound: 

Eq >Eq 



Now consider the kinetic energy functional T[^] from Section 2.1 with physical constants inserted, 

i—l 

Here m is the mass of the electron and ft is Planck's constant h divided by 27r. We are interested in the limit 
ft ^ {semiclassical limit). Define now E^^' (h) and E^^'^^'^'^ (h) as in and ([El]), but with T[*] 

replaced by Tri['i']. Note now that the statement 

E'^^^(h) 

^ 1 as ft ^ (5.6) 



E^^'-''' (ft) 

is in general false. The reason is that when ft gets small, then (for typical Vne) the ground state densities of 
both models contract, and the approximation is not uniformly good on families of contracting densities. 

This has nothing particular to do with the use of Eqt, but (|5.6p fails for any DFT model (|2.15p whose 
electron interaction functional Vee has the correct scaling under dilations, 

Vee[a^pia-)]^aVeM-)l (5.7) 

such as the mean field functional (|2.18p or the local density approximation (|2.23p . A counterexample is 
already given by atoms, v{x) = —Z/\x\ (eq. (|2.ip in Section 2, with a = 1). Very remarkably, in this case 

pQA//-,N pDFT-OT 

Erm ^ and E^^^-O-iH) = ^° (5.8) 

and hence the quotient E§'^' (ft) /E^^'^^^^ (ft) is independent of ft! To prove this, use that the four functionals 
involved, T, y„e,Vee, and T4e, all have a definite scaling behaviour with respect to dilations. For a given 
^' G .4, consider its L^-norm-preserving dilation 

^n{xu.;XM) := ih^r^^/'-^{h-^xu.;h-^XN). 

Then (with T/j being the kinetic energy with prefactor fi^ /2m from (|5.5p ) 

{Tn + Vne + Vee)[^n] = h-^{Ti + Vne + Vee)m. 

Taking the infimum over gives the first assertion in ()5.8p . The second assertion follows analogously after 
noting that 

P*«(.T) = (ft2)-V*(ri-2x), £;oTb*1 = n-^EoT[p''\ 

(or more generally Fepjp*"] = ^^^Keip*"] for every Vee satisfying (|5.7p ). 

What we can prove is the following "pointwise" statement in which we only minimize out ^ at fixed p: 



Theorem 5.2. Let N = 2. The 



lim FhkIp] = Eot[p] for every p <E 11. 



Here TZ is the natural class of densities given by the image of A under the map ^/ i-^ p (TZ is defined in 
h2.11]) ). and Fhk is the Hohenherg-Kohn functional h2.13\) with kinetic energy functional T/j in place of T . 



This case for N = 2 already contains the main analytic issue, namely that the optimal transport measure 
7 is singular and so its square root fails to be in and fails to have an gradient. But the case allows 
to avoid the quantum mechanical issues of spin and antisymmetry, which would enter on top of this when 
iV > 3. 

An interesting challenge raised by the above theorem is to derive higher order corrections to Eqt in the 
semiclassical limit. 



5.1 Re-instating the constraint 

In order to show that hm;i_j.o Fhk [p] ~ Eqt [p] , wc will need to make modifications to the optimal plan 7 
which yields Eot[p], since any which represents 7 has r[^] — +00. Therefore we cannot use these VP's 
as trial states in the variational principle for Fhk[p\- Hence, we will need to modify the optimal 7. But the 
modifications that one would like to use, e.g. smoothing, lead to modified marginals. 

Hence we need to be able to control the change in Eqt [p] induced by a small change in p. This is not trivial, 
due to the rigid infinite-dimensional constraint in the variational principle for Eqt that the trial states must 
have marginals exactly equal to p, and is achieved in Theorem 15.31 below. 



The main technical idea behind this theorem is the following construction to "re-instate the constraint" , 
i.e. to deform a given trial plan into a nearby one with prescribed marginals. Suppose we are given an 
arbitrary transport plan ^a^a with equal marginals p^, and an arbitrary second density ps- We assume 
that pa,Pb € r]L^{M.^), pa,Pb > 0, and J^3 pAix) dx = J^^ pb{x) dx = 1. Our interest is in the case 
when is near pA , but the construction works for general p b ■ 

Intuitively, the plan jb^b with equal marginals pB have in mind is the following. 

• First transport ps to pA by a transport plan 7_b-s-a that does not move much mass around when ps 
is close to PA- 

• Then apply the plan ^A-i-A- 

• Finally transport pA back to pB. 

First, let us construct a suitable plan ^b^a- Let f(x) := Ta.\n.{pA(x), pb{x)}. Take Ja '■= {pA — f)+ and 
./b := {pB - /)+• Then pA = f + Ia and pb = .f + Ib- 

On / we "do nothing", i.e. we let: 

lf^f{x,y) = f{x)S^{y). 

On Ja we transport to fs via a convenient plan which allows simple estimates (note that J^^ Ja (x) dx = 
/r3 fB(x) dx, due to the fact that /j^a pa{x) dx = /jja pb{x) dx): 

I X fA{x)fB{y) 



/k3 fB{x)dx' 
We then set 

-fA^B{x,v) = -ff^f{x,y) +-if„^f^{x,y) = .f{x)5^{y) + 1^^^^'^,^^^] ■ (5.9) 

Jr3 jB(y) dy 

Note that J^^ -fA^B{x,y) dy = f{x) + fAix) = pa{x) and / 7a->s(x, y) da; = f{y) + fB{y) ^^^^^^ = 
f{y) + /^(y) = PB{y), as required. We will also need the reverse plan 

7b^a(x,z/) = f{x)5Ay) + {^^f f^ff , (5.10) 

UsfAiy) dy 

which satisfies J^3jB^Aix,y)dy = pb{x) and J^^ jB~^Aix,y) dx = pA{y)- Finally we introduce the com- 
bined plan 

P{x,w) := [ [ -iB^A{x,y) ^P--*>"*-^-* 7j_^^(y,z) '^^-^^°^^-* -fA^B{z,w) dy dz. (5.11) 

PA(y) PA[Z) 

We now claim that 

P{x, w) dw = pb{x) a.nd j P{x,w) dx — psiw). (5.12) 



To prove the first claim, we begin by integrating over w. This yields 

P{x,w)dw^ [ [ jb^a{x, y) ^^■^>°'-^^ Ia^a^V^ z) -^^-^^"^"^ pa{z) dydz. 
Jr3Jr3 PA{y) PA[z) 



Noting that ^'p^l^^) Pa{z) = 1 whenever ■^°/l^^{y,z) > and recalhng that J^s Ia^a^V^ ^) = Pa{v), 
integrating over z yields 

/ P{x,w)dw= [ Jb^a{x, y) -^P-^>°^^-* p^(y) dy. 

Jr3 Jr3 pA[y) 

Since '^''p^^yj''' PA^y) = 1 whenever jB^Aix, y) > and since jb^a{x, y) dy = pb{x), the right hand side 
becomes equal to Pb{x) after integrating over y. The second marginal condition can be derived analogously. 



5.2 Continuity of the optimal transport functional 



By combining the techique introduced above with appropriate estimates, we are able to control the change 
in Eot[p\ induced by a small change in p. 

Theorem 5.3. There exists a > such that for any paiPb G H with PAiPb ^ and 

/r3 Pa{x) dx = J]jj3 Pa{x) dx = 1, the optimal transport functional with Coulomb cost h5.S\) satisfies 

\Eot\Pa] - Eot[Pb]\ < C* (||PA||LinL3(K3) + ||PB||LinL3(R3)) WPA - P_B||Li(K3)nL3(B3), 

where \\pi\\L^^^L^M.3) max{||pi||Li(R3), ||pi||L3(R3)} /or i G {A,B}. 

Proof. Fix arbitrarily two marginals pA, Pb G i^nL"^(IR'^), with pA, Pb ^0 and pa{x) dx ~ Pa{x) dx = 
1. Let 7A-i.A ~ 1°aXa optimal transport plan of C subject to the constraint that 7a->a has equal 

marginals pA- The main idea is to consider the associated plan jb^b = P introduced in (|5.1ip and show 
that 



C{"/b^b) < C{1°a*^a) + (IIP^IUinL3(R3) + ||ps||LinL3(K3)) WPA - PB||Li(R3)nL3(R3)- 

By the variational principle for Eot[pb] and the optimality of 7^^^ this implies 

Eot[Pb\ < EotIpa] + C* (||pA||LinL3(R3) + l|Ps||LinL3(K3)) WPA - Ps||Li(R3)nL3(K: 

as required. 

Step 1: C(P) < C(7j^^) + 3M with M = supy^na J^, c{y,w)fB{w)dw: 
By substituting the expressions (j5.9p and (jS.lOp into (|5.1ip . we get 



(5.13) 



C{P) 



R3 JR3 



c{x, w) 



f{x)5xiy) 



fB{x)fA{y) 



/k3 fA{y)dy 



XpA>o{y) opt I xXp.4>o(2) 
PA{y) pa(z) 



fA{z)fB{w) 

/k3 fB{w)dw 



dxdydzdw. (5.14) 



This is a sum of four terms, which arise by picking one term from each square bracket and carrying out the 
integrals over the delta functions: 



c(y, z)f{y) 711^(2/, -)^^^^#^/(-) dydz, 



W2 = 



PA{y) 
xpA>o{y) opt 



Pa{z) 



c{y,w)f{y) 

/?3 PA{y) 



and 



Wi = 



7T-.(y,^) ^----7P f^yf y dydzdu,, 
PA[z) Jjj3/s(w)dw 

dx, .)f^^ 7r:.(y, .)^^/(.) dxdydz 

R3 f^ifA{y)dy pA[y) Pa{z) 



, ^ fB{x)fA{y) XpA>oiy) opt , ^ XpA>o(^) fA{z)fB{w) , , , 

c(a;, w) 7 — , . -r-— 7a->a 2/ > ^) n T — f i \^ ^x dy dz dw. 

UsfAiyjdy pA[y) ^ Pa{z) j^sfB{w)dw 



Next, we will estimate each of these four terms. For the first term, we use the simple estimate that fXpA>o ^ 
pA, which gives 



Wi < 



opt 



For the second term, we estimate J^3 c{y, w) dw by the constant M defined in Step 2 and /(y) '^''p^l^y)'^ by 
1, and wc get 



Wo < 



Jk3 /b (w) dw 7r3 Pa{z) 
XpA>o{z)fA{z)dz = M. 



M 



/k3 fB{w)dw 

Analogously, by the change of variables (y,z,w) i— > {z,y,x) wc have 

W3 = < M- 

Finally, to bound W4 we estimate J^3 c{x,w)fB{'w) dw by M and /^(y) ^''p^^'y)'^ by 1, and we obtain 

M 



W4 < 



< 



{J^3fB{w)dw) Jm^ 
M 

(/k3/sM dwY 



fB (x) 1T-.A (2/: z)fA{z) ^^^^^ dx dy dz 



fsix) dx 



Pa{z) 

lT-.Aiv^^^)dylA{z)^^^^dz 



Pa{z) 



= M. 



Plugging the above bounds for Wi, W2, W3, W4 into (|5.14p yields the assertion. 
Step 2. For g e L'^ n L'^{R^), we have: 

1 



sup 



\x - y\ 



T9{y) dy 



< 



Co max{||5r||i,i(R3), ||g||i3(R3)}, 



(5.15) 



with co = 2(f 



To prove this, we split |^^^ into a short-range and a long-range part, 



1 _ X\z\<a X\z\>a 



\z\ 



\z\ 



hs{z) + hi{z), 



with the obvious definitions for hs and ft.;, and with cut-off parameter a > to be chosen later. Note that 
hs e L^/'^{R^) and hi € L°°{R^). By Holder's inequality we have 



1 



\x - y\ 



l{y) dy 



hsix - y)g{y) dy + / hi{x - y)g{y) dy 



< ll^.s|lL3/2(R3)||g||L3(R3) 

+ ll^i|l-L°°(R3)ll5llLi(M3) < ||5llLinL3(M3) (| |/ls | |l3/2(r3) -|- | | |l~ (rS)) . 



Explicitly, 



I^s||l3/2(r3) + ||/ii||L~(R3) = (^4:71 J r'^-^dr 



2/3 



2/3 



1 

a+ -. 

a 



Minimizing over a in the above gives a = (^) , leading to the value of cq in the assertion. 
Step 3. Putting it all together: 
By Steps 1 and 2 wc have 

C{P) < C{j"/X^) + 3co||/B|UinL3(R3). 

But < Jb < \pA - Pb\, so ||/s||LinL3(R3) < \\pA~ PB||LinL3(R3). This establishes ()5.13p and Theorem[5i3l 
withe, = 3co = 6(^)1/3. □ 



5.3 Finiteness of kinetic energy 

In this section wc investigate the behaviour of derivatives of the combined plan jb^b ~ P introduced in 
(jS.lip when the original plan ^a^a is differcntiable. 

Recall that ^a^a is a transport plan of C subject to the constraint of equal marginals pA^ 7a->b was defined 
in ()5.9|) . and jb-^a is the reverse plan (|5.10p . Unlike in the previous section, here 'fA^A does not need to be 
optimal. Due to the fact that wave functions correspond, up to integrating out variables, to square roots of 
pair densities, and the kinetic energy of a wave function is ^ / |V^p, we have to show that V^/^b^b & L^, 
in order to be able to construct an admissible trial function with pair density 7 in the variational definition 
of the Hohenberg-Kohn density functional Fhk- The following result gives hypotheses under which this is 
true. Before stating the result we introduce the following notion which we call strong positivity. 

Definition 5.4. A transportation plan 7 G 'P{M.'^'^) with marginals ji, v e ViW^) is called strongly positive 
if there exists a constant /3 > such that 

^ > Pp® v. 



Wc note that strong positivity implies, in particular, that the support of 7 is the product of the supports of 
its marginals, supp 7 = supp p, x supp v. 

Theorem 5.5. Suppose that pa,Pb > 0, ^ypA, \J Pb G -ff^(M'^), and assume that 7a->.a belongs to the set 
A^ + (R^) (see Section 3) and has equal marginals pA- 

(i) ^J7°/Xa G H^i^^) (smoothness); 

(ii) ^"aXa — f^PA ® Pa for some constant /3 > (strong positivity) . 
Then the plan Jb^b = P defined in i5.11\) satisfies yfP G 



Proof. Plugging formula (|5.10|) for ^b^a into (|5.1ip and using that pa,Pb > and that J^g^a fA{y) dy = 
/r3 fB{y)dy, we have 

N fi^) f opt , ..1a^b{z,w) fB{x) f f , , .IaXa^V^^) I ^J j 

P{x,w)^ — — / 7/_^^(x,z)) dz + - — / / fA{y)^^ — —-iA^B{z,w)dydz. 

Pa{x) Jm? Pa{z) fB{y)dy Jpiz pA{y)PA{z) 

Consequently, 



\Jy/P(x,w) = 



2VP{x,w) 



V/(a;) f opt , ..lA^siz^w) fix) VpAjx) f opt ,^ ^■,-,1a^b{z,w) 



PAix) y«3 - t '^^^^A^ 

v.7jA.(^,-))^^=^rf^ + T^^/ / fAiy)'-^fj^,,A^BizMdydz 

Pa{x) Jk.^ Pa[z) Jus JBiyJdy Jr3 Jji3 pA[y)PA[z) 

= ■.W1+W2 + W3 + Wi, 

with the obvious definitions for W^i, W2,W^3 and W4. We have to show that \Wi{- , w)\'^ dw G i^(R'^) for 
i = 1, . . . ,4. To estimate the first two terms, wc use the following lower bound on P which neglects the 
contribution from in P(x,w). 

fix) f opt I ■,-,1a^b{z,w) fix) 
PAix) Jr3 

It follows that 



P(x, w) > — -- / j')[_,Aix, z)) dz =: — i-rgix, w). (5.16) 

PAix) Jr3 pAiz) PAix) 



irrr / / P^ix) V/(a;) ~ M/^ / P^ix) ( f ix) V p Aix)\ 

\Wi{x,w)\ < -\ , , 7 T^q[x,w) and \Wi\x,w)\ < -\ ^, , , 7 r-r- Q(x,w) 

' - 2V fix)gix,w) PAix)^^ ' ' ' ' - 2y fix)gix,w) \ pAix) pAix) ' ^ 



and hence 



\mM\'<'-;^^9M and | u;)^ < 1 """"^^"^''^^"^ 



Afix)pA{x)' 

Next, due to 'yA^Biz,w)dw = pa{z), we have that 



g{x,w). 



'l{x, w) dw 



Pa{z) 



JR3 



Consequently, using the fact that |V-\/ap = ^ for any function a, we have 



111// <r irzzM 

|W^i(x,K;)|dw< - ^^^^ 



|VV7P and / \W2{x,w)\dw <\ 



1 /(x) |VpA(a:) 



< IVVpII', (5.17) 



where in the last inequality we have used / < pA- Since i/7 = min{^/pA, v^Ps}, and y^pA, y^ps G //^(M''), 
by a standard fact concerning Sobolcv functions we have y/f € 77^ (M'^) and 



Consequently, 



VV7== Xpa>pbV7^+Xpa<pbVVpI a.e.. 
/ |M^i(x,«;)d«;<|Vv/7|2<|VVpi|' + |VVpI|- 

JR3 



(5.18) 



Next we analyze W3. To this end, we make use of the identity 



Together with ()5.16p this yields 

m\ < 



Pa{x) f(x) 



f{x)g{x,w) pa{x 
To estimate the integral over z in the formula above, we write 



lA^A^X^^) "^^yiA^A^^^z) 



^^ a^b{z,w) 
Pa{z) 



dz 



ja^b{z,w) _ h-fA^B{z,w) Ha^b{z,w) 



Pa{z) 



Pa{z) 



pa{z) y 

group one of these factors with "('aI^a^^^ ^) and one with V x\/i'aXa(^^ -^)' and apply the Cauchy-Schwarz 
inequality. This yields 



m\ < 



\ — TTT ^\/9{x,w)J 

V PA[x)g(x,w) V . 



^^xJlA^Aix^z) 



^a~^b{z,w) 



and hence 



fix) 
Pa{x) 



77-1^(2;, z) 



1a^b{z,w) 
Pa{z) 



Pa{z) 



dz. 



dz 



Integrating over w and using that ~ ^ gives 



\W3{x,w)\^dw < 



fix) 

PAix) 



yx\/l7XAi^'Z) 



dz < 



^x'JlA^Ai^^z) 



dz. 



(5.19) 



Finally for is is natural to use a different lower bound for P than the one in (|5.16p , obtained by neglecting 
the first instead of the second term in P(x, w). 



P{x,w) > 



fBix) 



fAiv) 



7T->a(2/:^) 



/r3 fBiv) dy 7r3 7fl3 pAiy)pAiz) 



7A->b(z, w) dydz 



fBix) 



/k3 fBiy) dy 



giw) 



(5.20) 



Substituting this estimate into the definition for W4 immediately gives 



2 ^M^^ lM^fB{y)dy 
and, after squaring, integrating over and using J^^ g{w) dw = /^g dy, we get 

/ \W,ix,w)\dw<l^^I^^\V^\^ (5.21) 
Jr3 4 fB[x) 

But unUke the analogous bounds on Wi,W2,W3, this estimate is insufficient to infer W4 G i7^(M^) since 
VpX, VAb G ^/^^CM-^) do not imply that the function 



V/b = Y XpB>PAiPB - pa) 
belongs to H^. In fact, even when y/pA, J'pb are positive and belong to C°°, \f] need not be in 



Example 5.6. Let = (1 — a; + x^)e and ^^(a-) = (1 + a; + a;^)e ^ . Because 1 ± a; + > i(l + x^) 

is bounded away from zero, we have ^/pj, ^/pi" G H^(M), but ^/xpb>pa{Pb - Pa) = X(o,oo)(a;)y2xe~^^/2 <^ 
since |Vv/Xp«>pa(pb " Pa)? = X(o,oo)(a:) (2^ - 2x + 2^3) e-' ^ Li(K). □ 

Note that this example captures the generic behaviour of / near a point where any two smooth functions pA 
and pb cross. This effect is the reason why the additional assumption (ii) was made in Theorem 15.51 This 
assumption, together with (j5.16p . yields the following alternative lower bound on P 

P{x,w) > ^ / 13pa{x)pa{z) ^^^''1':'"^ dz ^ Pf{x)pB{w). (5.22) 
Pa{x) Jr3 pa[z) 

We fix a number 6 e (0, 1) and we use the lower bounds (|5.20p or (|5.22p . depending on whether /^(a;) > 
Spb{x) or /b(x) < Spb{x). 

Region 1: Assume fB{x) > 5pb{x). Via (|5.20p and (|5.2ip we obtain 
XfB>5pB \W4{-,w)\ dw < — <Tr^XpB>PA- 



Ad PB - 25-^--"- PB 

1 {IIpaI^IJpbT] . 

26^"^'"'-' \ PA ^ PB J - 6 



< -XPB>PA + ^) 4 (IVV^P + iVV^n . (5.23) 



Region 2: Assume /^(a;) < Spb{x). First of all, note that whenever fB{x) > 0, i.e. pb{x) > Pa{x), we 
have the following equivalences 

fnix) < 5pb{x) ^ pb{x) ~ pa{x) < 6pb{x) ^ Pb{x){1 - S) < pa{x) = mm{ p a{x), pb{x)} = f{x). 

Via ([E^ we have 

\W I < 1 |V/b(x)| 

" - 2^/3f{x)pBiw) /r3 fBiy)dy ^^"^'^ 



We split the factor g{w) into \/g{w)y^g{w) and estimate one of the factors via the elementary inequality 
/a (2/) < PA^y), so as to eliminate the bad factor y^ps from the denominator: 

^) ^ [[ [ /lM2I^(^,,^,(,,^),,,y'^(/ / ,j^,(,,.)d,3:^^4^d.)''' 

\jR3jK3pA(y) pa{z) / yJvL^Js? Pa{z) J 



(f P.(.)^^^4^c^.)''^V^. 



Consequently, 



1 |V/B(a;) 



2yW{x)jR3 fB{y)dy 
Squaring, integrating over w and using J^^ g{w)dw = J^^ fB{y)dy yields 

\W,M\^dw < r ,\ ■ (5.24) 

But in the region {x \ fsix) > 0} = {a; | pb{x) > pAix)}, as shown above we have f ~ pa 1^ ^ ^)pb, and 
consequently, 

XfB<SpB / < 7 7 — TTXr 

2/3 / j^^feiyjdy 

^ 1 ^|Vp^|2 , 1 \Vpb\ 



2(3 \ PA 1-5 PB J j^3fB{y)dy 

Combining ([STZ)) . ((SlSl) . ((5J9)) . ((5?23| and ((OS)) establishes the theorem. □ 

Remark 5.7. /n region 1, the factor appearing in 115.24^ is uncontrollably bad. In region 2, the factor 
appearing in US. 21]) is uncontrollably bad. 



5.4 Smoothing 

The third ingredient needed in the proof of Theorem 15.21 lies in the fact that the Coulomb cost functional 
C[7] = J \x — y\~'^d'y{x,y) is well behaved under smoothing of 7, despite the fact that the cost function 
|a; — yl"^ is discontinuous and hence does not belong to the dual of the space of probability measures on M^. 

Let p £TZ (see (12.11^ ). and let 7 £ r(p, p) be a minimizer of C[7] ~ J \x — y\^^dj{x, y) so that 

C[-f]=EoT[p]- 

We now introduce a standard mollification of 7, as follows. Let : R'^ K belong to the Schwartz space 
5(M'^) of smooth, rapidly decaying functions, and assume that (/) > 0, /j^., = 1,0 radially symmetric. E.g., 
the choice (/)(a;) = Tr^'^/^e"'^'" will do. Let 

e'^ e 

and let 7^ = (0e (g) (/>g) * 7, that is to say 

le{x,y)^l 4>^{x-x')(j)e{y-y')d-/{x',y'). (5.26) 

Proposition 5.8. The mollified pair density 7^ introduced in i5.26\) satisfies 

(a) 7, e C-(R6), > 

(b) J jeix,y) dx — Peiy), J le{x,y)dy — Pe{x), where p^ is the mollified marginal {4>^ * p){x) — /r3 4>t{x — 
x')p\x')dx'. 

(e) C[7e] <C[7]. 



Proof, (a): Smoothness is a standard fact concerning mollification of Radon measures, and positivity is 
obvious from the positivity of (j). 



(b): This follows from the elementary calculation 



(l),{y -y')d'y{x',y') = / (jj^iy - y')p{y')dy' . 



-fe{x,y) dx 



(c): First, we claim that the cost functional evaluated at the mollified transport plan, C[7e], can be interpreted 
as a cost functional with modified cost function evaluated at the original transport plan. Indeed, by Fubini's 
theorem 



c{x,y) 



- x')(t)e{y ~ y') dj{x', y') 



dx dy 



(t)e{x - x')(j)^{y - y')c{x,y) dx dy d-yix^y'). 



= :c(x' ,y') 

The modified cost function c(x', y') appearing here has an interesting physical meaning which we will exploit 
to establish (c), namely it is the electrostatic repulsion between the two charge distributions — x') and 
— y') (i.e., the charge distributions centered at x' respectively y' whose profile is given by the mollifier 
(/)(:). Now it is a standard fact going back to Newton that the electrostatic potential exerted by a radial 
charge distribution on a point outside it equals the potential exerted by the same amount of charge placed 
at the centre. 



1 



rdH^ix) 



1 



\Sr\ J \x — a\ ^ ' max{|a|,r}' 

where Sr denotes the sphere of radius r around 0, is the Hausdorff measure (area element) on the 
sphere, and |<S'r|(= 47rr^) is the total area of the sphere. This together with the radial symmetry of cj)^ (i.e., 
(j>e{x) = (/>£(! a; I) for some function (j)^) gives 



\x — a\ 



dx = 



< 



Sr\(peir) dr 



1 



x — a\ 
I 



-dH'^{x) = 



\Sr\^e{r) 



1 



max{|a|, r} 



dr 



1 



Hence by repeated application of (|5.27p 



c(x',y') 



< 



< 



<P<i{x)(l)c{y) 
I 



; 7r:dx dy 

\x + x' -{y + y')\ 



F - W + V )\ 
1 



—dy = / 4)^{y) 



\y - (a:' - y 



\x' - y' 



This establishes (c). 



(5.27) 

(5.28) 
(5.29) 
(5.30) 
□ 



5.5 Passage to the limit 

We are now in a position to give the 

Proof of Theorem ] 5. 2[ Let p be any density in TZ. Recall that p G TZ implies that y/p, Vy^ G L^(R'^) and 
hence, by the Sobolev embedding theorem, y/p e L^(R'^), whence p G D L^{R^). 

We have to show that limfi_j.o Fhk [p] = Eqt [p] ■ We will do so via the following strategy: 



• Start from an optimal transport plan 7 with marginals p. 

• Smooth it. 

• Make it strongly positive (see Definition 15. 4p . by mixing in a small amount of the mean field (i.e., 
tensor produet) plan. 

• Re-instate the marginal constraint, via the technique introduced in Section [5.11 

• Infer from Theorem 15.51 that, unlike the original optimal transport plan 7, the so-obtained modified 
plan P is the pair density (|2.6p of a wave function with square-integrable gradient. 

• Pass to the semiclassical limit, by careful error estimates on the three modification steps listed above 
(smoothing, achieving strong positivity, re-instating the constraint). 

We now implement this strategy in detail. Let 7 be an optimal transport plan of the Coulomb cost functional 
C subject to equal marginals p. (Of course we know from Section 3 that 7 is unique, but uniqueness is not 
needed here.) For e > 0, let 7^ be its mollification (|5.26p . By Proposition 15.81 its right and left marginals 
are given by the mollification = p of the density p. Finally we introduce the "strong positivization" 

7e,/3 := (1 - 13)-/, + I3p,<g) p,, 

where f3 £ (0, 1). Note that 7e_^ has the same marginals as 7^, regardless of the value of /3. 

Observe now that the transportation plan % ,3 and the densities p,, p satisfy the assumptions of Theorem l5.5l 
Consequently, by Theorem 15 .51 there exists a transportation plan P,j3 with marginals p (i.e., with re-instated 
constraint) whose square root belongs to i7i(RS). 

Now comes the only step where we use the assumption N = 2. In this case we can achieve the (otherwise 
highly nontrivial) antisymmetry condition on appearing in (j2.2p purely by means of an antisymmetric spin 
part. More precisely we define 4' : (M'^ x ^2)^ — > C by 

f- — : -a(s)p(t) - l3(s)a(t) 

where a, /3 : Z2 = {±5} — > C are given by a(s) = (5i/2(s), /3(s) = (5_i/2(s). Then it is straightforward to 
check that ^' belongs to the admissible set A defined in (|2.2p and its pair density, density, and kinetic energy 
are 

pI = p,,,, = p, nm = ^ f iv/f^p. 

It follows that 

lim Fhk[p] < lim (rr,(*) + VeeW) = Ke(*) = C{P,,p). (5.31) 

Next, ([ET^ yields 

C[Pe,p] < C'[%^i3] + C*(^||p||LinL3 + llpelliinLa) Hp " PeWL^nL^- (5.32) 

Next, we claim that 

C[%,p] = (1 - /3)C[7.] + PC{p, ® p,) < C7[7.] + co/3||pe||Li||pe||LinL3. (5.33) 

This is immediate from the estimate 

|C[/®<7]| <Co||/||Li||5llLinL3 for any /, g e n L^, 

which follows by applying (|5.15p . multiplying by /, and integrating over x. 
Finally, we will need the following bound which was established in Proposition [5?8l 

Che]<C[j]. (5.34) 



Combining the estimates (|5.3ip - (|5.34p yields 



Y,t^Fhk[p\<CYi] . „ 



Pe\\L^\\Pe\\L^nL=> + [WPUL^nL^ + \\Pe\\L^nL^)\\P - PeUL^nL^ 



Letting /3 and e tend to zero and using that p^, being the molhfication (j)^ * p of p, tends to p in n L'^ as 
e — >■ yields 

lim FhkIp] < C[-f] = Eot[p]- 

The reverse inequality is immediate from (|5.4p and the positivity of T/j. This completes the proof of Theo- 
rem 



A Appendix 

Lemma A.l (Lcgendrc transforms on the line). Let I : M — > M U {+00} be lower semi-continuous and 
convex. Define its dual function 1° by hS.lO]) . Then 1° satisfies the same hypotheses as I, and 

(a) (A,0 e d.l if and only if (C, A) G 5./°; 

(b) the dual function of 1° is I, that is I ~ l°° ; 

(c) strict convexity of I implies 1° differentiable, where it is subdifferentiable; 
(iv) is non-increasing if and only if 1{X) = 00 for all A > 0. 

Proof, (a)-(c) follow from the corresponding statements in Theorem A.l in |GM96| . Assertion (d) is easily 
proved similarly to Theorem A. 3 (iv) in |GM96| . To verify the only i/ implication, suppose that 1{X) is finite 
at some A > 0; we shall show that 1° increases somewhere. Being convex, / must be subdifferentiable at A (or 
some nearby point): (A, 5) € d.l. Then (i) implies that 1° is finite at £, and increasing: + e) > + Xe 
for some e > 0. 

To prove the converse, suppose that 1° increases somewhere. Then one has A) G d.l° for some ^ G M and 
A > 0. Invoking once again (i) gives (A,^) G d.l, from which one concludes finiteness of 1{X). □ 

For X- G K'^ \ {0}, denote by X := x/\x\ the unit vector in direction of x. 

Lemma A. 2 (subdifferentiability of the cost). Let I : M — > K U j+cx^} be convex and non-increasing on 
[0,cxj) and define h{x) := l{\x\) on R*^. Unless h is a constant: {x,y) G d.h if and only if (|x|,— 1?;|) G d.l 
with y = \y\x and x ^ 0. 

Proof Fix a; G M'' \ {0} and suppose 1{X) admits ^ as a subdcrivative at : (|a;|,C) S d.l. Since I is convex 
and non-increasing, ^ < 0, while for e G M, 

l{\x\+e)>li\x\) + e^. (Al) 

Let 



e := \x-{-v\ - \x\ = +2 < x,v > - |a;| << x,v > +——, (A.2) 

2 a; 



which inequality follows from y/1 + A < 1 + Now h{x-\-v) = Z(|a;+w|) > /(|a;|-|-e), with e =< v,x > +o{\v\), 
as seen from (|A.2p . It follows immediately from Definition I3.11f l') that h is subdifferentiable at x, with 
{x,^x) G d.h. On the other hand, h cannot be subdifferentiable at the origin as h{Q) = 00. 

Now let {x, y) G so X 7^ and for small w G M'' 

h{x -\~v) > /i(x)-f <v,y> -fo(|z;|). 



Spherical symmetry of h forces y to be parallel to a;: otherwise, a slight rotation x + v := xcosd — zsinO of 
X in the direction z := y — (< y,x >)x would contradict h{x + v) = h{x) for 9 sufficiently small. Moreover, 
taking v ex yields (jA.ip with ^ :=< x,y > +o(l), which concludes the lemma: \y\ = ± < x,y > holds 
with a minus sign since / cannot increase. □ 

Lemma A. 3 (uniform subdiffcrentiability of the cost). Let I and h be defined as in the lemma above. Then 
h is subdifferentiable onM'^\{0}. Moreover, for 5 > 0, there is a real function Os{X) tending to zero linearly 
with |A| such that \x\ > 5,y G d.h{x) and w G R'' imply 

h{x + v)>h{x)+<v,y>+Os{v^). (A.3) 

Proof. For A > 0, the convex function I admits a subgradient ^ S d.l{X): for example, take its right derivative 
^ = Z'(A+). If |a-| ~ X, the lemma implies {x,£,x) G d.h, so h{x) is subdifferentiable at x. 

Now suppose that {x,y) G d.h. The opposite implication of the lemma yields y — ^x with (|a;|,^) G d.l so 
holds. Morover, ^ < 0. If u G M.'^, then h{x + v) > l{\x\ + e) where e is as in (|A.2p . By convexity of 
I, its right derivative is a non-decreasing function of A. Asssume \x\ > d so that ^ > l'{S'^). Together with 
(jA.ip and (|A.2[) . this assumption gives 

h{x + v) > h{x)+ <^x,v> Wl'iS+)/2S. 

□ 

Lemma A. 4. Let I and h be defined as in the Lemma \A.2l Define the dual function h* : M'^ — > R U {+00} 
via 13.10\) . The for some R > 0, 

(i) h*{y) is continuously differentiate on\y\> R while h* = +00 on \y\ < R; 
(a) (y, x) G d.h* with x ^ if and only if {x, y) G d.h with y 7^ 0; 
(Hi) if {y,x) G d.h* , then x — V h* [y) . 

Proof. The proof follows the same reasoning as the proof of Proposition A. 6 (i)-(iii) from jGM96j and will 
be omitted. □ 
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